Is it a good idea to put data into temp table first before joining several other tables?
For instance, let's say I have the following:
tableA, 5 million rows
tableB, 5 million rows
tableC, 5 million rows
...
tableG
The Query I want to perform may look like:
SELECT 1 FROM tableA
INNER JOIN tableB WITH (NOLOCK) ON tableA.col1= tableB.col1
LEFT JOIN tableC WITH (NOLOCK) ON ...
...
LEFT JOIN tableG WITH (NOLOCK) ON ...
WHERE tableA.someCol= conditionA AND tableB.someCol= conditionB...
Assuming with the filter, only a small subset of tableA will be returned. Would it be a good idea to pull data from tableA first before joining other tables, so as to avoid blocking and may be increase performance?
I tried googling but couldn't find any satisfactory answer. Thanks in advance.
Here are the "typicals" that I try. I usually try them out and see what happens under load and under "big data" that represents production row numbers, not dev row numbers.
Going from memory.
If it is "one time" use, I try to use the derived table method.
If it data in the "holder" table can be reused, I start with a #variableTable if the number of rows will be small.
2.b. The only time I've seen a #variableTable screw you is if you do some aggregate results...where the "summary rows" are only a few, but to generate the summary rows, you hit a large amount of rows. Think something like "Select StateAbbreviation, count(*) from dbo.LargeTableOfData".....there will only be 50 or so rows in the result table, BUT the aggregate data comes from a large table with lots of rows.
Then I to go a #TempTable. Most times without an index. Sometimes with an index.
2 or 3 times in my life the index on the #TempTable resulted in significant improvement.
It is a "try it out game". Sometimes you just don't know until you give it the ole college try.
Use Northwind
GO
/* Temp Table , No Index(es) */
IF OBJECT_ID('tempdb..#TempTableNoIndex') IS NOT NULL
begin
drop table #TempTableNoIndex
end
CREATE TABLE #TempTableNoIndex
(
OrderID int
)
Insert into #TempTableNoIndex (OrderID) select top 5 OrderID from dbo.Orders
Select * from dbo.[Order Details] od where exists (select null from #TempTableNoIndex innerHolder where innerHolder.OrderID = od.OrderID )
/* Temp Table , With Index(es) */
IF OBJECT_ID('tempdb..#TempTableWithIndex') IS NOT NULL
begin
drop table #TempTableWithIndex
end
CREATE TABLE #TempTableWithIndex
(
OrderID int
)
CREATE INDEX IX_TEMPTABLE_TempTableWithIndex_OrderID ON #TempTableWithIndex (OrderID)
Insert into #TempTableWithIndex (OrderID) select top 5 OrderID from dbo.Orders
Select * from dbo.[Order Details] od where exists (select null from #TempTableWithIndex innerHolder where innerHolder.OrderID = od.OrderID )
/* Variable Table */
Declare #HolderTable TABLE ( OrderID int )
Insert into #HolderTable (OrderID) select top 5 OrderID from dbo.Orders
Select * from dbo.[Order Details] od where exists (select null from #HolderTable innerHolder where innerHolder.OrderID = od.OrderID )
/* Derived Table */
Select * from dbo.[Order Details] od
join
( select top 5 OrderID from dbo.Orders ) as derived1
on od.OrderID = derived1.OrderID
/* Clean up */
IF OBJECT_ID('tempdb..#TempTableNoIndex') IS NOT NULL
begin
drop table #TempTableNoIndex
end
IF OBJECT_ID('tempdb..#TempTableWithIndex') IS NOT NULL
begin
drop table #TempTableWithIndex
end
Related
I work on sql server 2012 I face issue I can't get missed parts from trade code table and exist on table parts .
first I upload data of plid and codetypeid on table search data .
second I get related parts from table parts based on plid .
third i get missed parts from table trade code .
meaning I need to get parts exist on table parts and related to table search data and not exist on table trade code
Data Sample :
create table #searchdata
(
plid int,
codetypeid int
)
insert into #searchdata
(plid,codetypeid)
values
(84459,877490)
create table #parts
(
partid int,
plid int
)
insert into #parts(partid,plid)
values
(758901,84459),
(808091,84459),
(509030,84459),
(7090321,84459),
(32453,84459),
(45563,84459)
create table #tradecode
(
partid int,
codetypeid int
)
insert into #tradecode(partid,codetypeid)
values
(758901,877490),
(808091,877490)
select p.plid,s.codetypeid,count(p.partid) as countmissingParts
from #parts p
inner join #searchdata s on s.plid=p.plid
left join #tradecode t on t.codetypeid=s.codetypeid
where t.partid is null
group by p.plid,s.codetypeid
drop table #searchdata
drop table #parts
drop table #tradecode
what I try :
select p.plid,s.codetypeid,count(p.partid) as countmissingParts
from #parts p
inner join #searchdata s on s.plid=p.plid
left join #tradecode t on t.partid=s.plid
where t.partid is null
group by p.plid,s.codetypeid
Expected Result
plid codetypeid countmissingParts
84459 877490 4
You were very close on your what you tried query... Give this a try. Your LEFT JOIN, you joined the partid to the plid... Should have joined p.partid=t.partid
select p.plid,s.codetypeid,count(p.partid) as countmissingParts
from #parts p
inner join #searchdata s on s.plid=p.plid
left join #tradecode t on p.partid=t.partid
where t.partid is null
group by p.plid,s.codetypeid
I'm working in SQL Server 2016. Confusing problem with SQL issue. I have a TEMP table that contains unique rows. I have to insert 5 PRODUCTID values for each row each row based on another column value, AgentNo, in this temp table. The PRODUCTID value, there are 5 of them, comes from another table but there is no relationship between the tables. So my question is how do I insert a row for each ProductID into this temp table for each unique row that is currently in the temp table.
Here is a pic of the TEMP table that requires 5 rows for each:
Here is a pic of what I'm needing to come away with:
Here is my SQL code for both TEMP tables:
IF OBJECT_ID('tempdb..#tempTarget') IS NOT NULL DROP TABLE #tempTarget
SELECT 0 as ProductID, 1 as [Status], a.AgentNo, u.UserID, u.[Password], 'N' as AdminID, tel.LocationSysID --, tel.OwnerID, tel.LocationName, a.OwnerSysID, a.AgentName
INTO #tempTarget
FROM dbo.TEST_EvalLocations tel
INNER JOIN dbo.AGT_Agent a
ON tel.LocationName = a.AgentName
INNER JOIN dbo.IW_User u
ON a.AgentNo = u.UserID
WHERE tel.OwnerID = 13313
AND tel.LocationSysID <> 15434;
SELECT * FROM #tempTarget WHERE LocationSysID NOT IN (15425, 15434);
GO
-- Create source table
IF OBJECT_ID('tempdb..#tempSource') IS NOT NULL DROP TABLE #tempSource
SELECT DISTINCT lpr.ProductID
INTO #tempSource
FROM dbo.Eval_LocationProductRelationship lpr
WHERE lpr.ProductID IN (16, 15, 13, 14, 12) --BETWEEN 15435 AND 15595
Sorry I could not get this into a DDL file as these are TEMP tabless. Any help/direction would be appreciated. Thanks.
CROSS JOIN will be the best solution for your case.
If you only want 5 rows for each data in First table means, simply use the below cross join query.
SELECT B.ProductID,
A.[Status],
A.AgentNo,
A.UserID,
A.[Password] AS Value,
A.AdminID,
A.LocationSysID
FROM #tempTarget A
CROSS JOIN tempSource B
If you want additional row with 0, then you have to insert a 0 into your second temp table and use the same query.
INSERT INTO #tempSource SELECT 0
If i understand correctly following is the scenario,
One Temp table has all the content.
select * from #withoutProducts
product table
select * from #products
Then following is the query your are looking for
select a.ProductID,[Status],AgentNo,UserID,[value]
from #products a cross join #withoutProducts b
order by AgentNO,a.productID
I have one table (Table1) that has several columns used in combination: Name, TestName, DevName, Dept. When each of these 4 columns have values, the record is inserted into Table2. I need to confirm that all of the records with existing values in each of these fields within Table1 were correctly copied into Table 2.
I have created a query for it:
SELECT DISTINCT wr.Name,wr.TestName, wr.DEVName ,wr.Dept
FROM table2 wr
where NOT EXISTS (
SELECT NULL
FROM TABLE1 ym
WHERE ym.Name = wr.Name
AND ym.TestName = wr. TestName
AND ym.DEVName = wr.DEVName
AND ym. Dept = wr. Dept
)
My counts are not adding up, so I believe that this is incorrect. Can you advise me on the best way to write this query for my needs?
You can use the EXCEPT set operator for this one if the table definitions are identical.
SELECT DISTINCT ym.Name, ym.TestName, ym.DEVName, ym.Dept
FROM table1 ym
EXCEPT
SELECT DISTINCT wr.Name, wr.TestName, wr.DEVName, wr.Dept
FROM table2 wr
This returns distinct rows from the first table where there is not a match in the second table. Read more about EXCEPT and INTERSECT here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql?view=sql-server-2017
Your query should do the job. It checks anything that are in Table1, but not Table2
SELECT ym.Name, ym.TestName, ym.DEVName, ym.Dept
FROM Table1 ym
WHERE NOT EXISTS (
SELECT 1
FROM table2
WHERE ym.Name = Name AND ym.TestName = TestName AND ym.DEVName = DEVName AND ym. Dept = Dept
)
If the structure of both tables are the same, EXCEPT is probably simpler.
IF OBJECT_ID(N'tempdb..#table1') IS NOT NULL drop table #table1
IF OBJECT_ID(N'tempdb..#table2') IS NOT NULL drop table #table2
create table #table1 (id int, value varchar(10))
create table #table2 (id int)
insert into #table1(id, value) VALUES (1,'value1'), (2,'value2'), (3,'value3')
--test here. Comment next line
insert into #table2(id) VALUES (1) --Comment/Uncomment
select * from #table1
select * from #table2
select #table1.*
from #table1
left JOIN #table2 on
#table1.id = #table2.id
where (#table2.id is not null or not exists (select * from #table2))
This is my first question on here, so I apologize if I break any rules.
Here's the situation. I have a table that lists all the employees and the building to which they are assigned, plus training hours, with ssn as the id column, I have another table that list all the employees in the company, also with ssn, but including name, and other personal data. The second table contains multiple records for each employee, at different points in time. What I need to do is select all the records in the first table from a certain building, then get the most recent name from the second table, plus allow the result set to be sorted by any of the columns returned.
I have this in place, and it works fine, it is just very slow.
A very simplified version of the tables are:
table1 (ssn CHAR(9), buildingNumber CHAR(7), trainingHours(DEC(5,2)) (7200 rows)
table2 (ssn CHAR(9), fName VARCHAR(20), lName VARCHAR(20), sequence INT) (708,000 rows)
The sequence column in table 2 is a number that corresponds to a predetermined date to enter these records, the higher number, the more recent the entry. It is common/expected that each employee has several records. But several may not have the most recent(i.e. '8').
My SProc is:
#BuildingNumber CHAR(7), #SortField VARCHAR(25)
BEGIN
DECLARE #returnValue TABLE(ssn CHAR(9), buildingNumber CAHR(7), fname VARCHAR(20), lName VARCHAR(20), rowNumber INT)
INSERT INTO #returnValue(...)
SELECT(ssn,buildingNum,fname,lname,rowNum)
FROM SELECT(...,CASE #SortField Row_Number() OVER (PARTITION BY buildingNumber ORDER BY {sortField column} END AS RowNumber)
FROM table1 a
OUTER APPLY(SELECT TOP 1 fName,lName FROM table2 WHERE ssn = a.ssn ORDER BY sequence DESC) AS e
where buildingNumber = #BuildingNumber
SELECT * from #returnValue ORDER BY RowNumber
END
I have indexes for the following:
table1: buildingNumber(non-unique,nonclustered)
table2: sequence_ssn(unique,nonclustered)
Like I said this gets me the correct result set, but it is rather slow. Is there a better way to go about doing this?
It's not possible to change the database structure or the way table 2 operates. Trust me if it were it would be done. Are there any indexes I could make that would help speed this up?
I've looked at the execution plans, and it has a clustered index scan on table 2(18%), then a compute scalar(0%), then an eager spool(59%), then a filter(0%), then top n sort(14%).
That's 78% of the execution so I know it's in the section to get the names, just not sure of a better(faster) way to do it.
The reason I'm asking is that table 1 needs to be updated with current data. This is done through a webpage with a radgrid control. It has a range, start index, all that, and it takes forever for the users to update their data.
I can change how the update process is done, but I thought I'd ask about the query first.
Thanks in advance.
I would approach this with window functions. The idea is to assign a sequence number to records in the table with duplicates (I think table2), such as the most recent records have a value of 1. Then just select this as the most recent record:
select t1.*, t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
My second suggestion is to use a user-defined function rather than a stored procedure:
create function XXX (
#BuildingNumber int
)
returns table as
return (
select t1.ssn, t1.buildingNum, t2.fname, t2.lname, rowNum
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
);
(This doesn't have the logic for the ordering because that doesn't seem to be the central focus of the question.)
You can then call it as:
select *
from dbo.XXX(<building number>);
EDIT:
The following may speed it up further, because you are only selecting a small(ish) subset of the employees:
select *
from (select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber
) t
where seqnum = 1;
And, finally, I suspect that the following might be the fastest:
select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber and
t2.sequence = (select max(sequence) from table2 t2a where t2a.ssn = t1.ssn)
In all these cases, an index on table2(ssn, sequence) should help performance.
Try using some temp tables instead of the table variables. Not sure what kind of system you are working on, but I have had pretty good luck. Temp tables actually write to the drive so you wont be holding and processing so much in memory. Depending on other system usage this might do the trick.
Simple define the temp table using #Tablename instead of #Tablename. Put the name sorting subquery in a temp table before everything else fires off and make a join to it.
Just make sure to drop the table at the end. It will drop the table at the end of the SP when it disconnects, but it is a good idea to make tell it to drop to be on the safe side.
I have two tables with more than 800 rows in each.Table names are 'education' and 'sanitation'.The column name 'ID' is common in both the tables.Now i want to join these both tables as full outer join and i want to save the results of this table as a new table.I can join it very easily,But how to save those data's as a new table.Please help Me.
select * into bc from education e join sanitation s on e.id=s.id
I have around 30 columns in each table.So i can not explicitly create table schema for a new table.
I want all the columns from both tables.I have 20 tables with each 800 rows.From this 20 tables i want to make one master table taking 'ID' as primary key in all.
Sample Code:
Table one:
create table dummy1(
id int , fname varchar(50)
)
insert into dummy1 (id,fname) values (1,'aaa')
insert into dummy1 (id,fname) values (2,'bbb')
insert into dummy1 (id,fname) values (3,'ccc')
insert into dummy1 (id,fname) values (3,'ccc')
Table Two
create table dummy2(
id int , lname varchar(50)
)
insert into dummy2 (id,lname) values (1,'abc')
insert into dummy2 (id,lname) values (2,'pqr')
insert into dummy2 (id,lname) values (3,'mno')
Now Create new table 3
create table dummy3(
id int , fname varchar(50),lname varchar(50)
)
Insert Query for table 3 look like
insert into dummy3 (id,fname,lname)
(select a.id,a.fname,b.lname from dummy1 a inner join dummy2 b on a.id=b.id)
Table 3 will contain table1, table2 data
Follow below:
SELECT t1.Column1, t2.Columnx
INTO DestinationTable
FROm education t1
INNER JOIN sanitation t2 ON t1.Id = t2.Id
EDIT:
SELECT * will not work for you because you have a column ID which exists in both the tables. So the above solution will work you.
EDIT:
1- You can temporarily rename the Id column in one table, then try
2- SELECT *
INTO DestinationTable
FROm education t1
INNER JOIN sanitation t2 ON t1.Id = t2.Id
3- Revert the column name back to Id.