How to limit an inner join in Sybase? - inner-join

How can I limit an inner join or a subquery that it only selects one row? As it seems I can't use 'top 1' in my Sybase version (Sybase version: Adaptive Server Enterprise/15.5/EBF 19902) in subqueries.
Example
select * from A a
inner join B b on a.id = b.Aid
whereat table B has two records linked to table A (same Aid). But I'd like to join only one of these records.
I tried to replace the inner join with a subquery and using top 1, but this is not allowed.

I found a solution here: https://www.periscopedata.com/blog/4-ways-to-join-only-the-first-row-in-sql.html
select * from A a
inner join (select * from B b where b.Aid in (select min(Aid) from B group by Aid) )
as b on b.Aid = a.id

Came across this post while going from "Sybase ASE doesn't support ROW_NUMBER()" to "TOP is not allowed in subqueries", to how tf do Sybase engineers expect us to limit a subquery result to 1 record? All solutions I've seen rely on min/max, but I haven't seen anything supporting a "ORDER BY" type of sorting.
So a simple
ROW_NUMBER() OVER (ORDER BY t.SEQUENCE, t.FROMDATE, t.FROMTIME)
becomes
select MYVALUE
from (SELECT someExpression as MYVALUE,
RIGHT(CONVERT(VARCHAR,1000000+fs.SEQUENCE), 6) || CONVERT(VARCHAR,t.FROMDATE, 23) || CONVERT(VARCHAR,t.FROMTIME) as SORTKEY
FROM MY_TABLE t)
having SORTKEY = MIN(SORTKEY)
which is rather ugly, using all sorts of hacks to support string-sorting of the ORDER BY fields. As this will be used in subqueries, table alias scoping will mean that table joins need to be replicated.
The only alternative I can think of is a cursor with a break-condition so only the first row is processed, but that'll slow down things considerably.

Related

Merge Join over sorted columns instead of Hash Join

I have two tables
Table A
(
id int,
name varchar(39),
lname varchar (49),
...
)
Table B
(
id int,
city varchar(39),
...
)
Both tables are sorted on column ID. IDs are simply identities and are populated by auto incremented integers 1 to n.
However, if I input a query e.g.,
SELECT *
FROM A, B
WHERE A.id = B.id;
I get a hash join instead of the efficient merge join. How can I enforce the merge join in SQL Server instead? I don't want to use an index, thus no index-based plans.
Note that I don't want a merge-join with a sort-enforcer either, I know that one can hint the planner by rewriting the query to
SELECT *
FROM A
INNER MERGE JOIN B ON A.ID = B.ID;
By the way I'm using SQL Server Express edition. But I can change to any open source DB if the latter supports the query plan that I'm aiming.
Thanks in advance
If you believe you are smarted then the SQL Engine :-) you can use hits like this:
SELECT *
FROM A
INNER HASH JOIN B
ON A.id = B.id
OR
SELECT *
FROM A
INNER MERGE JOIN B
ON A.id = B.id
At least, you can test if the MERGE will be really better. And even it is better in this case, does not mean that it will be the best choice always. It can reduce the performance in other cases, so generally it will be better to leave this work to the engine.

SQL Server : can't find the right 'join' formula for what I want

I'm trying to find the right way to achieve this. Suppose I have 3 tables A, B and C.
I want my request to show some info from all 3 tables, but I want to show only one line by records that are in A.
The problem, if I join tables, is that there is most of the time a lot of B records linked to one A record, even worse, there is a lot of C linked to one B, so sometimes, the same A record is shown over a hundred times...
I tried select top(1) for B and top(1) again for C but still, it returns top(1) written on every 100 row of the same A, tried left join... inner join...
I'm trying to figure out how to group by but still can't find the right grouping. I ended up making A LOT of nested select, in fact, my query contains more nested select then anything else... it works but it takes forever...
Would it be faster if I find a way to remove most of my nested select ?
Is that even possible? I mean, did someone ever manage to accomplish this one line for all 'A' records query?
Try this:
Select * FROM A
OUTER APPLY (Select TOP 1 * FROM B Where A.colX = B.ColY) as New_B
OUTER APPLY (Select TOP 1 * FROM C Where A.colX = C.ColY) as New_C
You may need to modify the New_B and New_C Select statement to match your requirement.
You can use common table expression (cte) with row_number. Something like this.
;with cte as (
select a.id,b.name,c.price,
row_number() over(partition by a.id order by b.name, c.price) rn
from a inner join b on a.id = b.a_id
inner join c on b.id = c.b_id
)
select * from cte
where rn=1

Shortcut for adding table to column name SQL-server 2014

Stupidly simple question, but I just don't know what to google!
If I create a query like this:
Select id, data
from table1
Now I want to join with table2. I can immediately see that the id column is no longer unique and I have to change it to
table1.id
Is there any smart way (like a keyboard-shortcut) to do this, instead of manually adding table1 to every column? Either before I add the Join to secure that all columns will be unique, or after with suggestions based on the different possible tables.
No, there is no helper.
But do not you can alias the table name:
select x.Col1, y.Col2
from ALongTableName x
inner join AReallyReallyLongTableName y on x.Id = y.OtherId
which can also make queries clearer, and is very much necessary when doing self joins.
First of all, you should start using aliases:
SQL aliases are used to give a database table, or a column in a table,
a temporary name.
Basically aliases are created to make column names more readable.
This will narrow down your problem and make your code maintenance easier. If that's not enough, I guess you could start using auto-completion tools, such as these:
SQL Complete
SQL Prompt
ApexSQL Complete
These have your desired functionality, however, they do not always work as expected (at least for me).
Oh! You can use alias table name. Like this:
SELECT A.ID, A.data
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
You just only use A. or B. if two table have same this column selected. If they different, you don't need: Like this:
SELECT A.ID, data -- if Table B not have column data
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
Or:
Select A.*, B.ID
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID

FIltering on the join?

Is there any argument, performance wise, to do filtering in the join, as opposed to the WHERE clause?
For example,
SELECT blah FROM TableA a
INNER JOIN TableB b
ON b.id = a.id
AND b.deleted = 0
WHERE a.field = 5
As opposed to
SELECT blah FROM TableA a
INNER JOIN TableB b
ON b.id = a.id
WHERE a.field = 5
AND b.deleted = 0
I personally prefer the latter, because I feel filtering should be done in the filtering section (WHERE), but is there any performance or other reasons to do either method?
If the query optimizer does its job, there is no difference at all (except clarity for others) in the two forms for inner joins.
That said, with left joins a condition in the join means to filter rows out of the second table before joining. A condition in the where means to filter rows out of the final result after joining. Those mean very different things.
With inner joins you will have the same results and probably the same performance. However, with outer joins the two queries would return different results and are not equivalent at all as putting the condition in the where clause will in essence change the query from a left join to an inner join (unless you are looking for the records where some field is null).
No there is no differences between these two, because in the logical processing of the query, WHERE will always go right after filter clause(ON), in your examples you will have:
Cartesian product (number of rows from TableA x number of rows from TableB)
Filter (ON)
Where.
Your examples are in ANSI SQL-92 standard, you could also write the query with ANSI SQL-89 standard like this:
SELECT blah FROM TableA a,TableB b
WHERE b.id = a.id AND b.deleted = 0 AND a.field = 5
THIS IS TRUE FOR INNER JOINS, WITH OUTER JOINS IS SIMILAR BUT NOT THE SAME

SQL Query execution shortcut OR logic?

I have three tables:
SmallTable
(id int, flag1 bit, flag2 bit)
JoinTable
(SmallTableID int, BigTableID int)
BigTable
(id int, text1 nvarchar(100), otherstuff...)
SmallTable has, at most, a few dozen records. BigTable has a few million, and is actually a view that UNIONS a table in this database with a table in another database on the same server.
Here's the join logic:
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
Average joined size is a few thousand results. Everything shown is indexed.
For most SmallTable records, flag1 and flag2 are set to 1, so there's really no need to even access the index on BigTable.text1, but SQL Server does anyway, leading to a costly Indexed Scan and Nested Loop.
Is there a better way to hint to SQL Server that, if flag1 and flag2 are both set to 1, it shouldn't even bother looking at text1?
Actually, if I can avoid the join to BigTable completely in these cases (JoinTable is managed, so this wouldn't create an issue), that would make this key query even faster.
SQL Boolean evaluation does NOT guarantee operator short-circuit. See On SQL Server boolean operator short-circuit for a clear example showing how assuming operator short circuit can lead to correctness issues and run-time errors.
On the other hand the very example in my link shows what does work for SQL Server: providing an access path that SQL can use. So, as with all SQL performance problems and questions, the real problem is not in the way the SQL text is expressed, but in the design of your storage. Ie. what indexes has the query optimizer at its disposal to satisfy your query?
I don't believe SQL Server will short-circuit conditions like that unfortunately.
SO I'd suggest doing 2 queries and UNION them together. First query with s.flag1=1 and s.flag2=1 WHERE conditions, and the second query doing the join on to BigTable with the s.flag1<>1 a s.flag2<>1 conditions.
This article on the matter is worth a read, and includes the bottom line:
...SQL Server does not do
short-circuiting like it is done in
other programming languages and
there's nothing you can do to force it
to.
Update:
This article is also an interesting read and contains some good links on this topic, including a technet chat with the development manager for the SQL Server Query Processor team which briefly mentions that the optimizer does allow short-circuit evaluation. The overall impression I get from various articles is "yes, the optimizer can spot the opportunity to short circuit but you shouldn't rely on it and you can't force it". Hence, I think the UNION approach may be your best bet. If it's not coming up with a plan that takes advantage of an opportunity to short cut, that would be down to the cost-based optimizer thinking it's found a reasonable plan that does not do it (this would be down to indexes, statistics etc).
It's not elegant, but it should work...
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1 = 1 and s.flag2 = 1) OR
(
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
)
SQL Server usually grabs the subquery hint (though it's free to discard it):
SELECT *
FROM (
SELECT * FROM SmallTable where flag1 <> 1 or flag2 <> 1
) s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
...
No idea if this will be faster without test data... but it sounds like it might
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1) AND (s.flag2=1)
UNION ALL
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=0 AND b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=0 AND b.text1 <> 'value1')
Please let me know what happens
Also, you might be able to speed this up by just returning just a unique id for this query and then using the result of that to get all the rest of the data.
edit
something like this?
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE
(s.flag1=1) AND (s.flag2=1)
UNION ALL
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE EXISTS
(SELECT 1 from BigTable b
WHERE
(s.flag1=0 AND b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=0 AND b.text1 <> 'value1')
)
Hope this works - careful of shortcut logic in case statements around aggregates but...
SELECT * FROM
SmallTable s
INNER JOIN JoinTable j ON j.SmallTableID = s.ID
INNER JOIN BigTable b ON b.ID = j.BigTableID
WHERE 1=case when (s.flag1 = 1 and s.flag2 = 1) then 1
when (
(s.flag1=1 OR b.text1 NOT LIKE 'pattern1%')
AND (s.flag2=1 OR b.text1 <> 'value1')
) then 1
else 0 end

Resources