I'm looking to convert this SQL Server (T-SQL) query that uses a cross apply to Oracle 11g. Oracle does not support Cross Apply until 12g, so I have to find a work-around. The idea behind the query is for each Tab.Name that = 'Foobar', I need find the previous row's name with the same ID ordered by Tab.Date. (This table contains multiple rows for 1 ID with different Name and Date).
Here is the T-SQL code:
SELECT DISTINCT t1.ID
t1.Name,
t1.Date,
t2.Date as 'PreviousDate',
t2.Name as 'PreviousName'
FROM Tab t1
OUTER apply (SELECT TOP 1 t2.Date,
t2.Name
FROM Tab t2
WHERE t1.Id = t2.Id
ORDER BY t2.Date DESC) t2
WHERE t1.Name = 'Foobar' )
Technically, I was able to recreate this same functionality in Oracle using LEFT JOIN and LAG() function:
SELECT DISTINCT t1.ID
t1.Name,
t1.Date,
t2.PreviousDate as PreviousDate,
t2.PreviousName as PreviousName
FROM Tab t1
LEFT JOIN (
SELECT ID,
LAG(Name) OVER (PARTITION BY ID ORDER BY PreviousDate) as PreviousName,
LAG(Date) OVER (PARTITION BY ID ORDER BY PreviousDate) as PreviousDate
FROM Tab) t2 ON t2.ID = t1.ID
WHERE t1.Name = 'Foobar'
The issue is the order it executes the Oracle query. It will pull back ALL rows from Tab, order them (because of the LAG function), then it will filter them down using the ON statement when it joins it to the main query. That table has millions of records, so doing that for EACH ID is not feasible. Basically, I want to change the order of operations in the sub-query to just pull back rows for a single ID, sort those rows to find the previous, and join that. Any ideas on how to tweak it?
TL;DR
SQL Server: filters, orders, joins
Oracle: orders, filters, joins
You can look for the latest row per (id) group with row_number():
select *
from tab t1
left join
(
select row_number() over (
partition by id
order by Date desc) as rn
, *
from t2
) t2
on t1.id = t2.id
and t2.rn = 1 -- Latest row per id
Related
I have 2 tables (pics 1,2). There are 2 columns in tab1 and many in tab2. There is ID column which is same for every record in both tables. I need output (pic 3) with every column from both tables grouped by ID. Left or right join doesn't really matter. I just need match records by ID and have every column of both tables listed.
Thank you.
1 https://i.stack.imgur.com/1OmDT.png
2 https://i.stack.imgur.com/nShTy.png
3 https://i.stack.imgur.com/vLIFm.png
You can simply use this simple query :
select a.ID, b.name, b.code, a.quantity
from Tab1 a
inner join Tab2 b on a.ID = b.ID
group by a.ID, b.name, b.code, a.quantity
order by a.ID
and you can learn more from here https://www.w3schools.com/sql/sql_join.asp
select tab1.id,name,code, quantity from tab1 join tab2 using(id);
The JOINs(INNER OR LEFT) will change in the below query depending on whether you need to show the ID info irrespective you have any quantity(NULL or 0 using left join) for it in tab2 or only show the quantity for matching IDs(INNER join) there.
SELECT
t1.*,
SUM(t2.quantity) AS quantity
FROM
tab1 AS t1
INNER JOIN tab2 AS t2 ON t1.id = t2.id -- CHANGE THIS TO LEFT JOIN IF NEED
GROUP BY
t1.id,
t1.name,
t1.code
Suppose I have tables 1-4, all the other tables are linked to table1. For what its worth, table1, table2 and table3 are relatively small but table4 contains a lot of data.
Now I have the following query:
SELECT t1.id
, (SELECT COUNT(*) FROM table2 WHERE table1_id = t1.id) AS t2_count
, (SELECT COUNT(*) FROM table3 WHERE table1_id = t1.id) AS t3_count
, (SELECT COUNT(*) FROM table4 WHERE table1_id = t1.id) AS t4_count
FROM table1 t1
Due to the fact that the subqueries are dependent/correlated I assumed that there must be a better way (performance wise) to get the data.
I tried to do the following but it drastically increased the execution time (from about 2s to 35s). I'm guessing that the multiple left joins creates a very big data set?!
SELECT t1.id
, COUNT(t2.id) AS t2_count
, COUNT(t3.id) AS t3_count
, COUNT(t4.id) AS t4_count
FROM table1 t1
LEFT JOIN table2 t2 ON t2.table1_id = t1.id
LEFT JOIN table3 t3 ON t3.table1_id = t1.id
LEFT JOIN table4 t4 ON t4.table1_id = t1.id
GROUP BY t1.id
Is there better way to get the counts? I don't need the data from the other tables.
UPDATE:
Bart's answer got me thinking that the table1_id columns are nullable. I added a IS NOT NULL check to the WHERE clauses and this brought the time down to 1s.
SELECT t1.id
, (SELECT COUNT(*) FROM table2 WHERE table1_id IS NOT NULL AND table1_id = t1.id) AS t2_count
, (SELECT COUNT(*) FROM table3 WHERE table1_id IS NOT NULL AND table1_id = t1.id) AS t3_count
, (SELECT COUNT(*) FROM table4 WHERE table1_id IS NOT NULL AND table1_id = t1.id) AS t4_count
FROM table1 t1
I guess not. If you execute a SELECT COUNT(*) FROM [table], it should perform a count on the table's PK. That should be pretty fast, even for very large tables.
Is your table4 a real table (and not a view, or a table-valued function, or something else that looks like a table)? And does it have a primary key? If so, I don't think that the performance of a SELECT COUNT(*) FROM [table4] query can be increased significantly.
It may also be the case, that your table4 is heavily targeted (in concurrent transactions over multiple connections), or perhaps your SQL Server is doing some heavy IO or computations. I cannot assume anything about that, however. You may try to check if your query is also slow on a restored database backup on a physically separate test server.
I am using SQL Server 2000 so sadly common table expressions are out but I have the following 2 table structure:
Invoices
id
InvoiceQueryReasons
id
invoice_id
reason
date_queried
I want to inner join from Invoices to InvoiceQueryReasons and only get 1 row for each invoice that selects the most recent InvoiceQueryReasons record.
If I was using a CTE with postgress I would do something like:
SELECT *
FROM
(SELECT
"i"."id", "iq"."reason", "iq"."date_queried",
rank() OVER (PARTITION BY "invoice_id" ORDER BY "date_queried" DESC) AS "invoice_query_rnk",
FROM "invoices" i
INNER JOIN "InvoiceQueryReasons" iq ON ("i"."id" = "iq"."invoice_id")
) AS "iqrs"
INNER JOIN
"invoices" ON ("iqrs"."id" = "invoices"."id")
WHERE
("invoice_query_rnk" = 1)
Apologies if the query is not exactly right, I am writing this on a non-dev machine.
How could I write a similar query in SQL Server 2000 where I do not have common table expression?
This is the way I always used to do this in SQL Server 2000:
FROM Table1
INNER JOIN Table2 ON Table2.PrimaryKey=(
SELECT TOP 1 t2.PrimaryKey
FROM Table2 t2
WHERE t2.ForeignKey=Table1.PrimaryKey
ORDER BY t2 DateColumn DESC
)
I am new to Microsoft SQL Server. I am trying to join two tables that has common key named CampaignID using LEFT OUTER JOIN. I need to reuse the result in a different query, so I decided to capture the result set using CTE_Results. For example,
-- This is my CTE script
WITH CTE_Results AS
(
SELECT t1.CampaignID, t2.CampaignID, t1.Name, t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID
)
-- This is the script I want to use to compare the resulting table. For example,
SELECT Vendor
FROM CTE_Results
However, when I ran above, I get:
The column `CampaignID` was specified multiple times for `CTE_Results`.
From reading through old StackOverflow questions and answers, it seems like since CampaignID is in both tables that are being joined, I must use table aliases to specify whose (which table's) CampaignID I want to SELECT. But I think I did that and even that it seems like the error still occurs.
Is there a way for me to select and keep BOTH CampaignID's in my CTE? If so, what should be changed? Thank you for the answers!
You have CampaignID selected twice in CTE, use different alias name to fix the problem
WITH CTE_Results
AS (SELECT t1.CampaignID AS cd_CampaignID,
t2.CampaignID AS cod_CampaignID,
t1.NAME,
t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID)
-- This is the script I want to use to compare the resulting table. For example,
SELECT Vendor
FROM CTE_Results
or use this
WITH CTE_Results(cd_CampaignID, cod_CampaignID, NAME, Vendor)
AS (SELECT t1.CampaignID,
t2.CampaignID,
t1.NAME,
t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID)
-- This is the script I want to use to compare the resulting table. For example,
SELECT Vendor
FROM CTE_Results
You need to Alias the CampaignID Columns in your CTE or define the returned column names in the CTE declaration. Otherwise it would be like creating a table with two columns with the same name.
Example Column Alias:
WITH CTE_Results AS
(
SELECT t1.CampaignID as 'CampaignID1', t2.CampaignID as 'CampaignID2', t1.Name, t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID
)
Or In CTE declaration:
WITH CTE_Results (CampaignID1, CampaignID2, [Name], Vendor) AS
(
SELECT t1.CampaignID, t2.CampaignID , t1.Name, t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID
)
Is it possible to improve performance by taking the following SQL:
SELECT t1.id,
t1.name,
t2.subname,
t2.refvalue
FROM table1 AS t1
CROSS apply (SELECT TOP 1 t2.subid,
t2.subname,
t3.refvalue
FROM table2 AS t2
INNER JOIN table3 AS t3
ON t2.subid = t3.subid
ORDER BY lastupdated DESC) AS t2
And rewriting it so that it looks like this:
SELECT t1.id,
t1.name,
t2.subname,
t3.refvalue
FROM table1 AS t1
CROSS apply (SELECT TOP 1 t2.subid,
t2.subname
FROM table2 AS t2
ORDER BY lastupdated DESC) AS t2
INNER JOIN table3 AS t3
ON t2.subid = t3.subid
Firstly, does it give the same result?
If so, what does the query plan say, and also set statistics io on?
How many rows in Table1, Table2 and Table3? How many intersect and end up in the result? I'm trying to figure out the purpose of rewriting the query, and agree with gbn... do you get the same result, does the query plan look the same in both cases, do the statistics i/o get any better, and does the rewritten query run any faster?