I have a stored procedure that is heavily using multiple level of nested dervied tables. I am just wondering what is the best method of debugging this type of queries where you can look into what is coming from the inner derived tables etc? Any thoughts?
Sometimes, I'll at least temporarily pull those derived tables out into a table variable or temp table so I can get a better look at what's going on.
So, in an over-simplified example:
select *
from table_a
inner join (select * from table_b) b
...
would become
select *
into #tempb
from table_b
select * from #tempb /* for debugging purposes */
select *
from table_a
inner join #tempb b
...
Related
I have come across this situation multiple times wherein I need to grab data from one or another table based on some parameter to the stored procedure. Let me clarify with an example. Suppose we need to grab some data from either an archived table or an online table and a bunch of other tables. I can think of 3 ways to accomplish this:
Use an if condition and store result in a temp table and then join temp table to other tables
Use an if condition and grab data either from archive table or online table and join other tables. The entire query will be duplicated except for the part of archive table or online table.
Use a union subquery
Query for Approach 1
create table #archivedOrOnline (Id int);
declare #archivedData as bit = 1;
if (#archivedData = 1)
begin
insert into #archivedOrOnline
select
at.Id
from
dbo.ArchivedTable at
end
else
begin
insert into #archivedOrOnline
select
ot.Id
from
dbo.OnlineTable ot
end
select
*
from
#archivedOrOnline ao
inner join dbo.AnotherTable at on ao.Id = at.Id;
-- Lots more joins and subqueries irrespective of #archivedData
Query for Approach 2
declare #archivedData as bit = 1;
if (#archivedData = 1)
begin
select
*
from
dbo.ArchivedTable at
inner join dbo.AnotherTable another on at.Id = another.Id
-- Lots more joins and subqueries irrespective of #archivedData
end
else
begin
select
*
from
dbo.OnlineTable ot
inner join dbo.AnotherTable at on ot.Id = at.Id
-- Lots more joins and subqueries irrespective of #archivedData
end
Query for Approach 3
declare #archivedData as bit = 1;
select
*
from
(
select
m.Id
from
dbo.OnlineTable ot
where
#archivedData = 0
union
select
m.Id
from
dbo.ArchivedTable at
where
#archivedData = 1
) archiveOrOnline
inner join dbo.AnotherTable at on at.Id = archiveOrOnline.Id;
-- Lots more joins and subqueries irrespective of #archivedData
Basically I am asking which approach to choose or if there is a better approach. Approach 2 will have a lot of duplicate code. The other 2 approaches remove code duplication. I even have the query plans but my knowledge of making decisions based on the query plan is limited. I always go with the approach which removes code duplication. If there is a performance issue, I may choose another approach.
Your approach 3 can work fine. You should definitely use UNION ALL not UNION though so SQL Server does not add operations to remove duplicates from the tables.
For best chances of success with approach 3 you would need to add an OPTION (RECOMPILE) hint so that SQL Server simplifies out the unneeded table reference at compile time at the expense of recompiling it on each execution.
If the query is executed too frequently to make that approach attractive then you may get an OK plan without it and filters with startup predicates to only access the relevant table at run time - but you may have problems with cardinality estimates with this more generic approach and it might limit the optimisations available and give you a worse plan than option 2.
If you don't mind extra unused columns in your results, you can represent such "IF"s with additional join conditions.
SELECT stuff
FROM MainTable AS m
LEFT JOIN ArchiveTable AS a ON #archivedData = 1 AND m.id = a.id
LEFT JOIN OnlineTable AS o ON #archivedData <> 1 AND m.id = o.id
;
If the Archive and Online tables have the same fields, you can even avoid extra result fields with select expressions like COALESCE(a.field1, b.field1) AS field1
If there are following joins that are dependent on values from ArchiveTable OnlineTable, this can be simplified by performing these core joins in a subquery (at least some coalesces will be necessary though)
SELECT stuff
FROM (
SELECT m.stuff, a.stuff, o.stuff
, COALESCE(a.field1, b.field1) AS xValue
, COALESCE(a.field2, b.field2) AS yValue
, COALESCE(a.field3, b.field3) AS zValue
FROM MainTable AS m
LEFT JOIN ArchiveTable AS a ON #archivedData = 1 AND m.id = a.id
LEFT JOIN OnlineTable AS o ON #archivedData <> 1 AND m.id = o.id
) AS coreQuery
INNER JOIN xTable AS x ON x.something = coreQuery.xValue
INNER JOIN yTable AS y ON y.something = coreQuery.yValue
INNER JOIN zTable AS z ON z.something = coreQuery.zValue
;
If there is criteria narrowing down the MainTable rows to be used, the WHERE for them should be included in the subquery to minimize the amount of Archive/Online carried out of the subquery.
If the Archive/Online table is actually the "main" table, the question's option 3 should work, but I would suggest putting any filtering criteria relevant to those tables in the their UNIONed subqueries.
If there is no filtering criteria on whatever table is "main", I would consider just maintaining two queries (or building one dynamically) so that the subqueries these approaches necessitate are not needed and will not interfere with index use.
I have two tables that I have joined together. I'd like to join the result of the joined table with the results of a stored procedure that has two variables.
I'm not sure whether or not I should create two temporary tables or another function, so I'm a little lost on where I should even start and what the easiest method would be.
Below is my first join.
SELECT *
FROM dbo.Users a WITH (NOLOCK)
JOIN Company b ON a.email = b.email
Below is my stored procedure, all it does is split one column into more rows. Split is another function. I would like to use an inner join.
SELECT a.*, b.*
FROM [dbo].[Menu] a
CROSS APPLY dbo.Split(SalesPersons, ',') b
WHERE ID = #ID AND Date = #Date
The easiest way to do this, assuming that the output from the stored procedure is deterministic would be to populate the output of the stored procedure into a temp table and then join to it.
CREATE TABLE #tmp
(
COL1 INT NOT NULL,
COL2 INT NOT NULL
)
INSERT INTO #tmp
Exec sproc_YourSproc 'Params'
SELECT *
FROM dbo.Users u
INNER JOIN dbo.Company c ON u.email = c.email
INNER JOIN #tmp t ON t.ID = c.ID
That being said, as Martin Smith said above, you probably want to move that logic into the stored procedure if possible.
Also, please don't use (NOLOCK) it doesn't really help the way most people think that it does, and it can cause some really nasty results. (Double reading rows, ghost records, ect)
If you need to be able to perform reads without causing read/write contention, I would investigate using more optimistic isolation levels, find ways to optimize the read performance to reduce possible congestion, or find indexing strategies that would make it possible to satisfy reads without locking the table itself.
Is there a more efficient way to write this? I'm not sure this is the best way to implement this.
select *
from stat.UniqueTeams uTeam
Left Join stat.Matches match
on match.AwayTeam = uTeam.Id or match.HomeTeam = uTeam.id
OR in JOINS is a bad practice, because MSSQL can not use indexes in right way.
Better way - use two selects with UNION:
SELECT *
FROM stat.UniqueTeams uTeam
LEFT JOIN stat.Matches match
ON match.AwayTeam = uTeam.Id
UNION
SELECT *
FROM stat.UniqueTeams uTeam
LEFT JOIN stat.Matches match
ON match.HomeTeam = uTeam.id
Things to be noted while using LEFT JOIN in query:
1) First of all, left join can introduce NULL(s) that can be a performance issue because NULL(s) are treated separately by server engine.
2) The table being join as null-able should not be bulky otherwise it will be costly to execute (performance + resource).
3) Try to include column(s) that has been already indexed. Otherwise, if you need to include such column(s) than better first you build some index(es) for them.
In your case you have two columns from the same table to be left joined to another table. So, in this case a good approach would be if you can have a single table with same column of required data as I have shown below:
; WITH Match AS
(
-- Select all required columns and alise the key column(s) as shown below
SELECT match1.*, match1.AwayTeam AS TeamId FROM stat.Matches match1
UNION
SELECT match2.*, match2.HomeTeam AS TeamId FROM stat.Matches match2
)
SELECT
*
FROM
stat.UniqueTeams uTeam
OUTER APPLY Match WHERE Match.TeamId = uTeam.Id
I have used OUTER APPLY which is almost similar to LEFT OUTER JOIN but it is different during query execution. It works as Table-Valued Function that can preform better in your case.
my answer is not to the point, but i found this question seeking for "or" condition for inner join, so it maybe be useful for the next seeker
we can use legacy syntax for case of inner join:
select *
from stat.UniqueTeams uTeam, stat.Matches match
where match.AwayTeam = uTeam.Id or match.HomeTeam = uTeam.id
note - this query has bad perfomance (cross join at first, then filter). but it can work with lot of conditions, and suitable for dirty data research(for example t1.id=t2.id or t1.name=t2.name)
Suppose I have 2 table need to join. There are 2 way to write the sql:
select * from taba a join tabb b on a.id =b.id where ...
select * from taba a, tabb b where a.id = b.id and ...
which one has better performance or this is only syntax issue with different SQL standard regardless of performance?
Has been already answered here
stackoverflow.com/questions/1129923/is-a-join-faster-than-a-where
The query optimizer usually use more a join than a where clause (so in theory is better the join) but the last word is said by the db engine you're using
The best advice is to try
Ok, basically what is needed is a way to have row numbers while using a lot of joins and having where clauses using these rownumbers.
such as something like
select ADDRESS.ADDRESS FROM ADDRESS
INNER JOIN WORKHISTORY ON WORKHISTORY.ADDRESSRID=ADDRESS.ADDRESSRID
INNER JOIN PERSON ON PERSON.PERSONRID=WORKHISTORY.PERSONRID
WHERE PERSONRID=<some number> AND WORKHISTORY.ROWNUMBER=1
ROWNUMBER needs to be generated for this query on that one table though. So that if we want to access the second WORKHISTORY record's address, we could just go WORKHISTORY.ROWNUMBER=2 and if say we had two address's that matched, we could cycle through the addresses for one WORKHISTORY record using ADDRESS.ROWNUMBER=1 and ADDRESS.ROWNUMBER=2
This should be capable of being an automatically generated query. Thus, there could be more than 10 inner joins in order to get to the relevant table, and we need to be able to cycle through each table's record independently of the rest of the tables..
I'm aware there is the RANK and ROWNUMBER functions, but I'm not seeing how it will work for me because of all the inner joins
note: in this example query, ROWNUMBER should be automatically generated! It should never be stored in the actual table
Can you use a temp table?
I ask because you can write the code like this:
select a.field1, b.field2, c.field3, identity (int, 1,1) as TableRownumber into #temp
from table1 a
join table2 b on a.table1id = b.table1id
join table3 c on b.table2id = c.table2id
select * from #temp where ...