We have a view described as the following :
CREATE view [dbo].[PriceHourlyView]
AS
select NodeId, TimeStamp,RtDa,MarketId,Lmp,Mlc,Mcc,Issettledprice,datecreated,IsCalculated from dbo.PriceHourly WITH (NOLOCK)
union all
select NodeId, TimeStamp,RtDa,MarketId,Lmp,Mlc,Mcc,Issettledprice,datecreated,IsCalculated from dbo.PriceHourly2018 WITH (NOLOCK)
union all
select NodeId, TimeStamp,RtDa,MarketId,Lmp,Mlc,Mcc,Issettledprice,datecreated,IsCalculated from dbo.PriceHourly2017 WITH (NOLOCK)
union all
select NodeId, TimeStamp,RtDa,MarketId,Lmp,Mlc,Mcc,Issettledprice,datecreated,IsCalculated from dbo.PriceHourly2016 WITH (NOLOCK)
union all
select NodeId, TimeStamp,RtDa,MarketId,Lmp,Mlc,Mcc,Issettledprice,datecreated,IsCalculated from dbo.PriceHourly2015 WITH (NOLOCK)
union all
select NodeId, TimeStamp,RtDa,MarketId,Lmp,Mlc,Mcc,Issettledprice,datecreated,IsCalculated from dbo.PriceHourly2014 WITH (NOLOCK)
union all
select NodeId, TimeStamp,RtDa,MarketId,Lmp,Mlc,Mcc,Issettledprice,datecreated,IsCalculated from dbo.PriceHourly2013 WITH (NOLOCK)
union all
select NodeId, TimeStamp,RtDa,MarketId,Lmp,Mlc,Mcc,Issettledprice,datecreated,IsCalculated from dbo.PriceHourly2012 WITH (NOLOCK)
Each of the tables has a check constraint as follows for each year except for the current table without year specified :
ALTER TABLE [dbo].[PriceHourly2017] WITH CHECK ADD CONSTRAINT [CK_PriceHourly2017_Timestamp] CHECK (([timestamp]>='2017-01-01' AND [timestamp]<='2017-12-31 23:59'))
When this view is queried by itself the check constraints limit the tables being searched. The execution plan looks like this :
SELECT
*
FROM PriceHourlyview
WHERE nodeid = 24511
AND TimeStamp BETWEEN '2017-05-17' AND '2017-05-24'
Now when I join on this table on the timestamp field the query no longer uses the check constraints and uses every table to check for the data.
SELECT
*
FROM ShapeProfileDetails s WITH (NOLOCK)
LEFT JOIN PriceHourlyView p WITH (NOLOCK)
ON s.TimeStamp = p.Timestamp
AND s.EffectiveDate BETWEEN '2017-05-17' AND '2017-05-24'
WHERE NodeId = 24512
--AND s.EffectiveDate BETWEEN '2017-05-17' AND '2017-05-24'
I know I'm not querying the same field in the joined example and assuming that's the issue but that is the field I need to query for the correct results. I'm wondering if there is anyway to hint or force the query to use the correct check constraints. Or what is the best practice on joining to try and utilize these check constraints.
Okay, as discussed we know that EffectiveDate and TimeStamp are nearly the same. I would try to do something like this. Technically it's the same query, but we will let know SQL Server that it can use constraints (just subtract and add one day on the edges of BETWEEN).
SELECT * FROM ShapeProfileDetails s WITH (NOLOCK)
JOIN PriceHourlyView p WITH (NOLOCK)
ON s.TimeStamp = p.Timestamp
AND s.EffectiveDate BETWEEN '2017-05-17' AND '2017-05-24'
AND s.TimeStamp BETWEEN '2017-05-16' AND '2017-05-25'
WHERE NodeId = 24512
Related
I am selecting data from 2 tables using union select:
select Product_Code from Discount_Table union select Product_Code from Discount2_Table
Union Query returns this
So after I select the data, I want to use this data to join on with other tables, example, Product_Table, but I'm having errors.
This is my query
select Product_Name, Price
from Discount_Table
union
select Product_Code
from Discount2_Table
join Product_Table on Discount_Table.Product_Code = Product_Table.Product_Code
Any tips/help would be appreciated!
You can use subquery like this:
SELECT * FROM
(select Product_Name,Price from Discount_Table union select Product_Code from Discount2_Table) Discount_Table
JOIN Product_Table ON Discount_Table.Product_Code = Product_Table.Product_Code
There are a few different ways you can refer to a results table later in a query, but here are a couple:
You can put the results from your first query into a CTE (Common Table Expression) and then join on that further down in your code:
WITH product_codes (Product_code) AS
(
select Product_Code
from Discount_Table
union
select Product_Code
from Discount2_Table
)
select t.Product_Name, t.Price
from product_codes pc
join Product_Table t on pc.Product_Code = t.Product_Code
You can also use temporary tables:
select Product_Code
INTO #product_codes
from Discount_Table
union
select Product_Code
from Discount2_Table
select t.Product_Name, t.Price
from #product_codes pc
join Product_Table t on pc.Product_Code = t.Product_Code
which works similarly by storing the results from your first query into a temporary table that you can access later on in the query. There are different reasons why you'd choose one version over the other, but they would both work enough to get you results.
I found a nice answer explaining the differences a bit more here.
I have a table of 100,000,000+ values, so efficiency is very important to me. I need to take information from table A, join it to an index table B, then join to table C using the index retrieved from table B. The problem is, there are multiple indexes for each value in table A, and I want to retrieve the one with the most recent date.
The query below creates duplicates:
SELECT ID_1, ID_2, Date
INTO #DEST_TABLE FROM Table_1 t1
INNER JOIN Table_2 t2 ON t1.ID_1=t2.ID_1
INNER JOIN Table_3 t3 ON t2.ID_2=t3.ID_2
This one does not, but when running with more than 35,000 vs 40,000 elements, the execution time goes from <5sec to >1min:
SELECT ID_1, ID_2, Date
INTO #DEST_TABLE FROM
(SELECT * FROM Table_1 l CROSS APPLY Table_2 t2 WHERE t1.ID_1=t2.ID_1) t_temp
LEFT JOIN Table_3 t3 ON t_temp.ID_2=t3.ID_2
How can I decrease my execution time as much as possible?
Here is an example table:
For this table, I would be trying to get the most recent location for each person.
None of the columns are indexed and I cannot create indexes on this table.
First of all, when you are working on 100 Million+ records and that
too joining to other tables, first thing I would ask is what is the
rationale behind not creating indexes which can cover your query. If
you are not the admin of that system, I would suggest that you
should bring this up to admin group and try to understand what is
the exact reason (if any) they do not want index on that huge table.
Specially because you mentioned "efficiency is very important to
me".
Remember that 'SQL Tuning' is only one of the steps of 'Database Performance Tuning' and you can tune only as much with writing a good SQL Query. When the data volume gets huge, a good SQL Query is never sufficient without taking other Performance Tuning Measures.
Apart from what Roger has already provided, here are a few solutions that you can try out:
Solution 1
SELECT T1.ID_1, OA.ID_2, OA.Location
FROM Table1 T1
OUTER APPLY (
SELECT TOP 1 T3.ID_2, T3.Location
FROM Table2 T2
INNER JOIN Table3 T3
ON T2.ID_2 = T3.ID_2
WHERE T2.ID_1 = T1.ID_1
ORDER BY T3.Date DESC
) OA;
Solution 2:
SELECT DISTINCT
T1.ID_1
,T2.ID_2
,Location = FIRST_VALUE(T3.Location) OVER (PARTITION BY T1.ID_1 ORDER BY T3.Date DESC)
FROM Table1 T1
INNER JOIN Table2 T2
ON T1.ID_1 = T2.ID_1
INNER JOIN Table3 T3
ON T2.ID_2 = T3.ID_2;
Data Preparation:
DROP TABLE IF EXISTS Table1
DROP TABLE IF EXISTS Table2
DROP TABLE IF EXISTS Table3
SELECT TOP 10000 ID_1 = object_id, name
INTO Table1
FROM sys.all_objects
ORDER BY object_id
SELECT ID_1 = T1.ID_1, ID_2 = IDENTITY(INT, 1, 1)
INTO Table2
FROM Table1 T1
CROSS JOIN Table1 T2
SELECT ID_2, Location = 'City_'+ CAST(ID_2 AS VARCHAR(100)), Date = CAST(DATEADD(DAY, ID_2/10000, GETDATE()) AS DATE)
INTO Table3
FROM Table2
Indexes to cover the Solution 1:
CREATE NONCLUSTERED INDEX IX_TABLE1_ID_1 ON Table1 (ID_1)
CREATE NONCLUSTERED INDEX IX_TABLE2_ID_2 ON Table2 (ID_1, ID_2)
CREATE NONCLUSTERED INDEX IX_TABLE3_ID_2 ON Table3 (ID_2, Date DESC) INCLUDE (Location)
Execution Plan:
You can see that all are 'Index Seek' except for Table1 which is an legitimate 'Index Scan' because you are doing scans for each value of Table1's ID_1 value. If you put a where clause in the outer loop to search for a few specific ID_1 values, then that 'Index Scan' will turn to a 'Index Seek' as well.
I will leave the Index Strategy for the 2nd solution to you (as a homework :) ). Tips: You have to make the Location as a key as well. Or you can go with COLUMNSTORE index approach.
You can use something like this:
select top (1) with ties
a.A_Id, b.B_Id, b.Date
from dbo.TableA a
inner join dbo.TableB b on a.A_Id = it.A_Id
inner join dbo.TableC c on c.B_Id = b.B_Id
order by row_number() over(partition by a.A_Id order by b.Date desc);
Alternatively, you can try an olde fashioneth approache:
select a.A_Id, b.B_Id, b.Date
from dbo.TableA a
inner join dbo.TableB b on a.A_Id = b.A_Id
inner join dbo.TableC c on c.B_Id = b.B_Id
where not exists (
select 0 from dbo.TableB pb where pb.B_Id = b.B_Id and pb.Date > b.Date
);
However, as with all such situations, its performance will heavily depend on indices. SSMS can suggest you some, if you will look at the execution plan; off the top of my head, you will need all Id columns to be indexed, and you will need either a single (Date) or a composite (A_Id, Date, B_Id) on the TableB.
UPD: If you can't create or modify any indices, and performance is paramount, I would suggest copying the data in question into a separate schema or database, where you might have appropriate permissions. Apart from that... it's impossible to get something out of nothing.
I have a query which uses IN clause (can use EXISTS also) for multiple columns which are filtered using OR Clause inside WHERE Clause. Is there any better approach to write this query.
SELECT columndata FROM TABLE1
WHERE column1key in (select columnkey from #temptable1)
OR column2key in (select columnkey from #temptable2)
OR column3key IN (SELECT columnkey FROM #temptable3)
You can go for 'LEFT JOIN' as shown below
SELECT columndata
FROM TABLE1 tab1
LEFT JOIN #temptable1 t1 on tab1.column1key = t1.columnkey
LEFT JOIN #temptable2 t2 on tab1.column2key = t2.columnkey
LEFT JOIN #temptable3 t3 on tab1.column3key = t3.columnkey
You may get better performance by this, which breaks down the SELECT into separate queries with a de-duplication later.
SELECT columndata FROM TABLE1
WHERE column1key in (select columnkey from #temptable1)
UNION
SELECT columndata FROM TABLE1
WHERE column2key in (select columnkey from #temptable2)
UNION
SELECT columndata FROM TABLE1
WHERE column3key IN (SELECT columnkey FROM #temptable3)
But you would really have to try it
With no or bad indexes, you still have to scan then same amount of data. With good indexes, this may work better...
As a side note, EXISTS and IN will give the same plan here
I am new to Microsoft SQL Server. I am trying to join two tables that has common key named CampaignID using LEFT OUTER JOIN. I need to reuse the result in a different query, so I decided to capture the result set using CTE_Results. For example,
-- This is my CTE script
WITH CTE_Results AS
(
SELECT t1.CampaignID, t2.CampaignID, t1.Name, t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID
)
-- This is the script I want to use to compare the resulting table. For example,
SELECT Vendor
FROM CTE_Results
However, when I ran above, I get:
The column `CampaignID` was specified multiple times for `CTE_Results`.
From reading through old StackOverflow questions and answers, it seems like since CampaignID is in both tables that are being joined, I must use table aliases to specify whose (which table's) CampaignID I want to SELECT. But I think I did that and even that it seems like the error still occurs.
Is there a way for me to select and keep BOTH CampaignID's in my CTE? If so, what should be changed? Thank you for the answers!
You have CampaignID selected twice in CTE, use different alias name to fix the problem
WITH CTE_Results
AS (SELECT t1.CampaignID AS cd_CampaignID,
t2.CampaignID AS cod_CampaignID,
t1.NAME,
t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID)
-- This is the script I want to use to compare the resulting table. For example,
SELECT Vendor
FROM CTE_Results
or use this
WITH CTE_Results(cd_CampaignID, cod_CampaignID, NAME, Vendor)
AS (SELECT t1.CampaignID,
t2.CampaignID,
t1.NAME,
t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID)
-- This is the script I want to use to compare the resulting table. For example,
SELECT Vendor
FROM CTE_Results
You need to Alias the CampaignID Columns in your CTE or define the returned column names in the CTE declaration. Otherwise it would be like creating a table with two columns with the same name.
Example Column Alias:
WITH CTE_Results AS
(
SELECT t1.CampaignID as 'CampaignID1', t2.CampaignID as 'CampaignID2', t1.Name, t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID
)
Or In CTE declaration:
WITH CTE_Results (CampaignID1, CampaignID2, [Name], Vendor) AS
(
SELECT t1.CampaignID, t2.CampaignID , t1.Name, t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID
)
I have a query like this:
Select Count(*) as TotalCount, Object2_ID, Object_ID, Object_Description
from Table1
inner join table2 on...
Group BY Object2_ID, Object_ID
I can't run this query because the column Object_Description isn't in GROUP BY or under aggregate function. Object_Description is a text column. I need any value of Object_Description. Now I use MAX(Object_Description) because it gives me right results, because Object_Description is the same for each group.
I can use MAX() or MIN() etc. - I will get right results in my query.
The question is - what is the most sufficient way to do this ?
I think that MAX() or MIN() produces small overheads.
You can get Object Description later, after calculation quantity (assumed that description in in table1 and you need get count from Table2):
SELECT Object_Id, Object_Description, Qty
FROM
(
SELECT Object_Id, Count(*) Qty
FROM Table2
GROUP BY Object_Id
) t
JOIN Table1 t2 on t2.Object_Id = t.Object_Id