Speed up a UPDATE with SELECT query - sql-server

I have two tables:
Table 1 has Episode and Code, with Episode as distinct.
Table 2 has Episode and Code, but Episode is not distinct (other fields in the table, not relevant to the task, make each row unique).
I want to copy Table 1's Code across to Table 2 for each episode. The current code to do this is as follows:
UPDATE Table2
SET Table2.Code = (SELECT TOP 1 Code FROM Table1 WHERE Episode = Table2.Episode)
This takes hours and hours. (I don't know precisely how many hours, because I cancelled it at about the 20 hour mark.) They are big tables, but surely there's a faster way?

I don't have a SQL Server handy and I'm not completely sure, but I seem to recall there was a syntax like the following which should probably speed things up.
UPDATE Table2 SET Table2.Code = Table1.Code FROM Table1
WHERE Table1.Episode = Table2.Episode

Are there any indices on the "Code" and "Episode" columns on both tables? Those would definitely help speed up things quite a bit!
Marc

You can use UPDATE with joins like this. Note that you have to specify FROM.
UPDATE MyTable
SET MyColVal = O.NewVal
FROM MyTable T
INNER JOIN MyOtherTable O ON T.Id=O.Id
WHERE ...
http://doc.ddart.net/mssql/sql70/ua-uz_3.htm

Related

Find the Min and Max date from two tables from a sql select statement

Cant seem to wrap my head round this problem.
I have two tables one which has the following sample values:
Second table had the following values:
What i am trying to achieve is like the following:
So you can see the first table has the modules, what year and what term.
Based on these there is a start week and and end week.
The lookup table for the start and the finish unfortunatley is in a week basis and i need the begin week to match the second tables weekNo based on the season i guess and taking the start date being Sdate from that table to match what i am looking for and then the same applies to the end date.
Match the season and the endweek with the second tables WeekNo and Edate and only bring that date in.
Hope i made a bit of sense but i am hoping the third image shows what i am look for.
I've tried CTE, Group by, Partition by, order by, min, max and got nowhere :(
Dont really want to hard code anything, so was hoping you wonderful peps can help me out !!
Many thanks in advance :)
I suspect you are trying to achieve this by using one a single join between the tables - whereas what you actually need is two separate joins:
SELECT table1.module as mod_code,
table1.season as psl_code,
table2.Sdate as ypd_sdate,
table3.Edate as ypd_edate
FROM t1 as table1
JOIN t2 as table2 ON table2.yr = table1.year AND table2.season = table1.season AND table2.weekNo = table1.BeginWeek
JOIN t2 as table3 ON table3.yr = table1.year AND table3.season = table1.season AND table3.weekNo = table1.EndWeek

SQL Server - Update All Records, Per Group, With Result of SubQuery

If anyone could even just help me phrase this question better I'd appreciate it.
I have a SQL Server table, let's call it cars, which contains entries representing items and information about their owners including car_id, owner_accountNumber, owner_numCars.
We're using a system that sorts 'importantness of owner' based on number of cars owned, and relies on the owner_numCars column to do so. I'd rather not adjust this, if reasonably possible.
Is there a way I can update owner_numCars per owner_accountNumber using a stored procedure? Maybe some other more efficient way I can accomplish every owner_numCars containing the count of entries per owner_accountNumber?
Right now the only way I can think to do this is to (from the c# application):
SELECT owner_accountNumber, COUNT(*)
FROM mytable
GROUP BY owner_accountNumber;
and then foreach row returned by that query
UPDATE mytable
SET owner_numCars = <count result>
WHERE owner_accountNumber = <accountNumber result>
But this seems wildly inefficient compared to having the server handle the logic and updates.
Edit - Thanks for all the help. I know this isn't really a well set up database, but it's what I have to work with. I appreciate everyone's input and advice.
This solution takes into account that you want to keep the owner_numCars column in the CARs table and that the column should always be accurate in real time.
I'm defining table CARS as a table with attributes about cars including it's current owner. The number of cars owned by the current owner is de-normalized into this table. Say I, LAS, own three cars, then there are three entries in table CARS, as such:
car_id owner_accountNumber owner_numCars
1 LAS1 3
2 LAS1 3
3 LAS1 3
For owner_numCars to be used as an importance factor in a live interface, you'd need to update owner_numCars for every car every time LAS1 sells or buys a car or is removed from or added to a row.
Note you need to update CARS for both the old and new owners. If Sam buys car1, both Sam's and LAS' totals need to be updated.
You can use this procedure to update the rows. This SP is very context sensitive. It needs to be called after rows have been deleted or inserted for the deleted or inserted owner. When an owner is updated, it needs to be called for both the old and new owners.
To update real time as accounts change owners:
create procedure update_car_count
#p_acct nvarchar(50) -- use your actual datatype here
AS
update CARS
set owner_numCars = (select count(*) from CARS where owner_accountNumber = #p_acct)
where owner_accountNumber = #p_acct;
GO
To update all account_owners:
create procedure update_car_count_all
AS
update C
set owner_numCars = (select count(*) from CARS where owner_acctNumber = C.owner_acctNumber)
from CARS C
GO
I think what you need is a View. If you don't know, a View is a virtual table that displays/calculates data from a real table that is continously updated as the table data updates. So if you want to see your table with owner_numCars added you could do:
SELECT a.*, b.owner_numCars
from mytable as a
inner join
(SELECT owner_accountNumber, COUNT(*) as owner_numCars
FROM mytable
GROUP BY owner_accountNumber) as b
on a.owner_accountNumber = b.owner_accountNumber
You'd want to remove the owner_numCars column from the real table since you don't need to actually store that data on each row. If you can't remove it you can replace a.* with an explicit list of all the fields except owner_numCars.
You don't want to run SQL to update this value. What if it doesn't run for a long time? What if someone loads a lot of data and then runs the score and finds a guy that has 100 cars counts as a zero b/c the update didn't run. Data should only live in 1 place, updating has it living in 2. You want a view that pulls this value from the tables as it is needed.
CREATE VIEW vOwnersInfo
AS
SELECT o.*,
ISNULL(c.Cnt,0) AS Cnt
FROM OWNERS o
LEFT JOIN
(SELECT OwnerId,
COUNT(1) AS Cnt
FROM Cars
GROUP BY OwnerId) AS c
ON o.OwnerId = c.OwnerId
There are a lot of ways of doing this. Here is one way using COUNT() OVER window function and an updatable Common Table Expression [CTE]. That you won't have to worry about relating data back, ids etc.
;WITH cteCarCounts AS (
SELECT
owner_accountNumber
,owner_numCars
,NewNumberOfCars = COUNT(*) OVER (PARTITION BY owner_accountNumber)
FROM
MyTable
)
UPDATE cteCarCounts
SET owner_numCars = NewNumberOfCars
However, from a design perspective I would raise the question of whether this value (owner_numCars) should be on this table or on what I assume would be the owner table.
Rominus did make a good point of using a view if you want the data to always reflect the current value. You could also use also do it with a table valued function which could be more performant than a view. But if you are simply showing it then you could simply do something like this:
SELECT
owner_accountNumber
,owner_numCars = COUNT(*) OVER (PARTITION BY owner_accountNumber)
FROM
MyTable
By adding a where clause to either the CTE or the SELECT statement you will effectively limit your dataset and the solution should remain fast. E.g.
WHERE owner_accountNumber = #owner_accountNumber

How to join two tables based on Grouping of 1 column in both the tables

I have come up a situation which is not easy to explain in sentence so i will go ahead and give the complete scenario here.
I have one result set like the below :-
It shows header_equipment_id(s) in a group of jil_equipment_id,relationship_name,cell_group.. For example 3159398,4622903 lies in one group.
The other result set is given below, This is the table where i want to update 3 columns namely Is_Applicable_Price,prc_content_rid,prc_type_name
If you notice clearly, You will find the same header_equipment_id column here. If you group it with the result found above, You will find 3 different groups for. But out of those 3 groups, one group is red, It is red because they belong to different cell_group/relationship_name.
**
Yellow and green are passed scenario and Red, Blue are fail.
**
I want to update the columns Is_Applicable_Price,prc_content_rid,prc_type_name if the Group of header_equipment_id(s) fall under the same cell_group and relationship_name.
So the final result set would look something like below -
Please help me with any inputs if possible. It's a situation where i know one single query won't work. But i will need to have multiple Temp tables for the transformation. But this is the shortest i have came across.
I am using Microsoft sql server 2012.
Please help. Even a small hint would be of great help to me. Thanks in advance.
It seems that the only thing the 2 tables have in common is that a cell_group can have a one or more rows of header_equipment_id. If we can generate a unique value based on header_equipment_id then we can join the 2 tables on this value. Note I have used a simple division , you may wish to check that this method is unique enough for your purposes.
/*create table a
(jil_equimentid int,relationship_name varchar(20),header_equipment_id int,
smart_equipment_id int,cell_group int,new_price_flag int,is_applicable_price int,prc_content_rid int,prc_type_name varchar(20))
truncate table a
insert into a values
(1282977,'default',3159398,1282977,3,1,1,106347924,'New Price'),
(1282977,'default',4622903,1262578,3,1,1,106347924,'New Price'),
(1282977,'default',1659861,1282977,6,1,1,106347925,'New Price'),
(1282977,'default',4622904,1282977,6,1,1,106347925,'New Price')
go
drop table t
go
create table t
(jil_equimentid int,relationship_name varchar(20),header_equipment_id int,
smart_equipment_id int,cell_group int,new_price_flag int,is_applicable_price int,prc_content_rid int,prc_type_name varchar(20))
truncate table t
insert into t values
(1282977,'128297711111 default',4622903,1282977,1,1,null,null,null),
(1282977,'128297711211 default',3159398,1262578,2,1,null,null,null),
(1282977,'128297712111 default',4622904,1282977,4,1,null,null,null),
(1282977,'128297712211 default',1659861,1282977,5,1,null,null,null),
(1282977,'128297711101 default',3159398,1262578,1,1,null,null,null),
(1282977,'128297711101 default',4622903,1282977,1,1,null,null,null),
(1282977,'default' ,3159398,1262578,2,1,null,null,null),
(1282977,'default' ,4622903,1282977,2,1,null,null,null),
(1282977,'128297711101 default',1659861,1262577,3,1,null,null,null),
(1282977,'128297711101 default',4622904,1282977,3,1,null,null,null),
(1282977,'default' ,1659861,1262577,4,1,null,null,null),
(1282977,'default' ,4622904,1262577,4,1,null,null,null)
*/
DROP TABLE #TEMPA;
;WITH CTE AS
(SELECT a.cell_group,
sum(a.header_equipment_id / 10000000.0000) uniqueval
from a
group by a.cell_group
)
SELECT DISTINCT CTE.UNIQUEVAL ,IS_APPLICABLE_PRICE ,PRC_CONTENT_RID ,PRC_TYPE_NAME
INTO #TEMPA
FROM CTE
JOIN A ON A.CELL_GROUP = CTE.CELL_GROUP
;WITH CTE AS
(
SELECT t.relationship_name,t.cell_group,
sum(t.header_equipment_id / 10000000.0000) uniqueval
from t
group by t.relationship_name,t.cell_group having count(*) > 1
)
SELECT T.*,CTE.UNIQUEVAL,ta.*
FROM CTE
JOIN T ON T.RELATIONSHIP_NAME = CTE.RELATIONSHIP_NAME AND T.CELL_GROUP = CTE.CELL_GROUP
join #tempa ta on ta.uniqueval = cte.uniqueval

Multi join issue

*EDIT** Thanks for all the input, and sorry for late reply. I have been away during the weekend without access to internet. I realized from the answers that I needed to provide more information, so people could understand the problem more throughly so here it comes:
I am migrating an old database design to a new design. The old one is a mess and very confusing ( I haven't been involved in the old design ). I've attached a picture of the relevent part of the old design below:
The table called Item will exist in the new design as well, and it got all columns that I need in the new design as well except one and it is here my problem begin. I need the column which I named 'neededProp' to be associated( with associated I mean like a column in the new Item table in the new design) with each new migrated row from Item.
So for a particular eid in table Environment there can be n entries in table Item. The "corresponding" set exists in table Room. The only way to know which rows that are associated in Item and Room are with the help of the columns "itemId" and "objectId" in the respective table. So for example for a particular eid there might be 100 entries in Item and Room and their "itemId" and "objectId" can be values from 1 to 100, so that column is only unique for a particular eid ( or baseSeq which it is called in table BaseFile).
Basically you can say that the tables Environment and BaseFile reminds of each other and the tables Item and Room reminds of each other. The difference is that some tables lack some columns and other may have some extra. I have no idea why it is designed like this from the beginning.
My question is if someone can help me with creating a query so that I can be able to find out the proper "neededProp" for each row in the Item-table so I can get that data into the new design?
*OLD-PART**This might be a trivial question but I can't get it to work as I want. I want to join a few tables as in the sql-statement below. If I start like this and run this query
select * from Environment e
join items ei on e.eid = ei.eid
I get like 400000 rows which is what I want. However if I add one more line so it looks like this:
select * from Environment e
join items ei on e.eid= ei.eid
left join Room r on e.roomnr = r.roomobjectnr
I get an insane amount of rows so there must be some multiplication going on. I want to get the same amount of rows ( like 400000 in this case ) even after joining the third table. Is that possible somehow? Maybe like creating a temporary view with the first 2 rows.
I am using MSSQL server.
So without knowing what data you have in your second query it's very difficult to say exactly how to write this out, and you're likely having a problem where there's an additional column that you are joining to in Rooms that perhaps you have forgotten such as something indicating a facility or hallway perhaps where you have multiple 'Room 1' entries as an example.
However, to answer your question regarding another way to write this out without using a temp table I've crufted up the below as an example of using a common table expression which will only return one record per source row.
;WITH cte_EnvironmentItems AS (
SELECT *
FROM Environment E
INNER JOIN Items I ON I.eid = E.eid
), cte_RankedRoom AS (
SELECT *
,ROW_NUMBER() OVER (ORDER BY R.UpdateDate DESC) [RN]
FROM Room R
)
SELECT *
FROM cte_EnvironmentItems E
LEFT JOIN cte_RankedRoom R ON E.roomnr = R.roomobjectnr
AND R.RN = 1
btw,do you want column from room table.if no then
select * from Environment e
join items ei on e.eid= ei.eid
where e.roomnr in (select r.roomobjectnr from Room r )
else
select * from Environment e
join items ei on e.eid= ei.eid
left join (select distinct roomobjectnr from Room) r on e.roomnr = r.roomobjectnr

Is the 'BETWEEN' function very expensive in SQL Server?

I'm trying to join two relatively simple tables together, but my query is experiencing serious hangups. I'm not sure why, but I think it might have something to do with the 'between' function. My first table looks something like this (with a lot of other columns, but this would be the only column I'm pulling):
RowNumber
1
2
3
4
5
6
7
8
My second table "groups" my rows into "blocks", and has the following schema:
BlockID RowNumberStart RowNumberStop
1 1 3
2 4 7
3 8 8
The desired result I'm looking to get is to link the RowNumber with the BlockID like below, with the same number of rows with the first table. So the result would look like this:
RowNumber BlockID
1 1
2 1
3 1
4 2
5 2
6 2
7 2
8 3
In order to get that, I used the following query, writing the results into a temp table:
select A.RowNumber, B.BlockID
into TEMP_TABLE
from TABLE_1 A left join TABLE_2 B
on A.RowNumber between B.RowNumberStart and B.RowNumberStop
TABLE_1 and TABLE_2 are actually very large tables. Table 1 is about 122M Rows, and TABLE_2 is about 65M rows. In TABLE_1, the RowNumber is defined as a 'bigint', and in TABLE_2, the BlockID, RowNumberStart, and RowNumberStop are all defined as 'int'. Not sure that makes a difference, but just wanted include that information, too.
The query has now been hung up for eight hours. Similar queries on this type and volume of data are not taking anywhere near this long. So I'm wondering if it could be the 'between' statement that's hanging up this query.
Definitely would welcome any suggestions on how to make this more efficient.
BETWEEN is simply shorthand for :
select A.RowNumber, B.BlockID
into TEMP_TABLE
from TABLE_1 A left join TABLE_2 B
on A.RowNumber >= B.RowNumberStart AND A.RowNumber <= B.RowNumberStop
If execution plan goes from B to A (but left join would indicate it has to go from A to B, really), then I'm assuming TABLE_1 is indexed on RowNumber (and that should be covering on this query). If it's only got a clustered index on RowNumber and the table is very wide, I recommend a non-clustered index only on RowNumber, since you'll fit a lot more rows per page that way.
Otherwise, you want to index on TABLE_2 on RowNumberStart DESC or RowNumberStop ASC, because for given A you'd need a DESC on RowNumberStart to match.
I think you might want to change your join to an INNER JOIN, the way your join criteria is set up. (Are you ever going to get TABLE_1 in no block?)
If you look at your execution plan, you should get more clues as to why the performance might be bad, but the Stop criteria is probably not used on the seek into TABLE_1.
Unfortunately SQLMenace's answer about the SELECT INTO has been deleted. My comment regarding that was meant to be: #Martin SELECT INTO performance isn't as bad as it once was, but I still recommend a CREATE TABLE for most production because SELECT INTO will infer types and NULLability. This is fine if you verify it is doing what you think it is doing, but creating a super long varchar or a decimal column with very strange precision can result in not only odd tables, but performance issues (especially with some of those big varchars when you forget a LEFT or whatever). I think it just helps to make it clear what you are expecting the table to look like. Often I will SELECT INTO using WHERE 0 = 1 and check out the schema and then script it with my tweaks (like adding an IDENTITY or adding a column with a timestamp default).
You have one main problem: you want to display too much data volume at once. Ar you really sure you want handle the result of ALL 122M rows from table 1 at once? Do you really need that?

Resources