Select from multiple tables without cartesian product - sql-server

I'm trying to create a command to copy the details of client memberships from one database to another with identical structure. I've pared it down to the bare essentials for the purposes of the question so the four items to copy are the expiry date, the subscription ID, the client ID and the item ID (which is the service which comprises the subscription).
The clients have a common GUID in both bases. The subscription ID is a unique long int which should be the same in both bases and the expiry date is just a date. So far, so easy. The tricky part is that the item_id is not necessarily the same in each database. I need to map from one to the the other with a where statement, which I know how to do.
My problem is that I need to select from the destination database's own ITEM table (item_0) in order to get and insert the correct item_id and when I do this I get thousands of duplicate rows returned. I assume I need to use a join to avoid this but as I have nothing meaningful to join item_0 to I can't get any results.
insert into DestDB..subscription (expiry_date,id,client_id,item_id)
select
sub_1.expiry_date,
sub_1.id,
cli_0.id as client_id,
item_0.id as item_id,
from SourceDB..subscription sub_1,
DestDB..item item_0,
DestDB..client cli_0
inner join SourceDB..client cli_1
on cli_1.[guid] = cli_0.[guid]
where sub_1.id not in (select id from DestDB..subscription)
and item_0.id =
(select id from DestDB..collectiondetails
where service_ID =
(select id from DestDB..service s_0 where s_0.code =
(select code from SourceDB..service s_1 where s_1.id =
(select service_ID from Source..collectiondetails item_1 where item_1.id = sub_1.item_id)))
and collection_ID =
(select id from DestDB..collection col_0
where col_0.code =
(select code from SourceDB..collection col_1 where col_1.id =
(select collection_ID from SourceDB..collectiondetails item_1 where item_1.id = sub_1.collection_ID)))
)

I am afraid the updated question is even more confusing. Is this Select in Where clause guaranteed to return one record only or should it be in instead of =?
If there is no rule to identify a particular DestDB..item from the list of matching then the top one should do as well. It would still seem that item_0 can be omitted altogether:
insert into DestDB..subscription (expiry_date,id,client_id,item_id)
select
sub_1.expiry_date,
sub_1.id,
cli_0.id as client_id,
(select Top 1 id from DestDB..collectiondetails --<- Limit to top 1 only
where service_ID =
(select id from DestDB..service s_0 where s_0.code =
(select code from SourceDB..service s_1 where s_1.id =
(select service_ID from SourceDB..collectiondetails item_1 where item_1.id = sub_1.item_id)))
and collection_ID =
(select id from DestDB..collection col_0
where col_0.code =
(select code from SourceDB..collection col_1 where col_1.id =
(select collection_ID from SourceDB..collectiondetails item_1 where item_1.id = sub_1.collection_ID)))
) as item_id,
from SourceDB..subscription sub_1,
DestDB..client cli_0
inner join SourceDB..client cli_1
on cli_1.[guid] = cli_0.[guid]
where sub_1.id not in (select id from DestDB..subscription)
Please note: if DestDB..item is empty the question example statement would not insert anything, but this answer one - would with item_id set to NULL.
I personally would try splitting this task into two separate statements in one transaction:
Insert into target table with NULL item_id.
Update target table with new item_id where item_id is NULL.
(optional) Delete unwanted records where appropriate item_id could not be found.

Related

Update a multiple records with duplicate column value

I have a query that identify how many times a ChassisNo was use:
Query:
SELECT
ROW_NUMBER() OVER (
PARTITION BY ChassisNo
ORDER BY datecreated ASC
) row_num,
CollateralType,
LoanID,
ClientID,
CollateralID,
PlateNo,
ChassisNo,
EngineNo,
datecreated,
PreparedBy
FROM
TestAllLoanWithCollaterals
Result:
I highlighted an example of duplicated chassisno three times, some of the chassisno are duplicated 5 times or so, but the main thing is, how can I update all records with the same details with the latest chassisno
Expected result
based on the highlighted example above:
The yellow highlight is the latest record based on the datecreated column and always the last row_num of each chassisno. the blue highlight is the columns that should be updated.
I am thinking of using the Database Cursor but I don't think it is possible.
You may use an update join involving your original table and the logic you have already defined:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ChassisNo ORDER BY datecreated DESC) rn
FROM TestAllLoanWithCollaterals
)
UPDATE a
SET
CollateralType = b.CollateralType
LoanID = b.LoanID
ClientID = b.ClientID
CollateralID = b.CollateralID
PlateNo = b.PlateNo
EngineNo = b.EngineNo
datecreated = b.datecreated
PreparedBy = b.PreparedBy
FROM TestAllLoanWithCollaterals a
INNER JOIN cte b
ON a.ChassisNo = b.ChassisNo
WHERE
b.rn = 1;
Note that the above update logic simply overwrites all fields among duplicate by chassis to use those of the record which were most recently updated in the group.

Issue with query in DB2

I was using below query in sql server to update the table "TABLE" using the same table "TABLE". In sql server the below query is working fine.But in DB2 its getting failed.Not sure whether I need to make any change in this query to work in DB2.
The error I am getting in DB2 is
ExampleExceptionFormatter: exception message was: DB2 SQL Error:
SQLCODE=-204, SQLSTATE=42704
This is my input Data and there you can see ENO 679 is repeating in both round 3 and round 4.
My expected output is given below. Here I am taking the ID and round value from round 4 and updating rownumber 3 with the ID value from rownumber 4.
My requirement is to find the ENO which is exist in both round 3 and round 4 and update the values accordingly.
UPDATE TGT
SET TGT.ROUND = SRC.ROUND,
TGT.ID = SRC.ID
FROM TABLE TGT INNER JOIN TABLE SRC
ON TGT.ROUND='3' and SRC.ROUND='4' and TGT.ENO = SRC.ENO
Could someone help here please. I tried something like this.But its not working
UPDATE TABLE
SET ID = (SELECT t.ID
FROM TABLE t, TABLE t2
WHERE t.ENO = t2.ENO AND t.ROUND= ='4' AND t2.ROUND='3'
) ,
ROUND= (SELECT t.ROUND
FROM TABLE t, TABLE t2
WHERE t.ENO = t2.ENO AND t.ROUND= ='4' AND t2.ROUND='3')
where ROUND='3'
You may try this. I think the issue is you are not relating your inner subquery with outer main table
UPDATE TABLE TB
SET TB.ID = (SELECT t.ID
FROM TABLE t, TABLE t2
WHERE TB.ENO=t.ENO ---- added this
and t.ENO = t2.ENO AND t.ROUND= ='4' AND t2.ROUND='3'
) ,
TB.ROUND= (SELECT t.ROUND
FROM TABLE t, TABLE t2
WHERE TB.ENO=t.ENO --- added this
and t.ENO = t2.ENO AND t.ROUND= ='4' AND t2.ROUND='3')
where tb.ROUND='3'
Try this:
UPDATE MY_SAMPLE TGT
SET (ID, ROUND) = (SELECT ID, ROUND FROM MY_SAMPLE WHERE ENO = TGT.ENO AND ROUND = 4)
WHERE ROUND = 4 AND EXISTS (SELECT 1 FROM MY_SAMPLE WHERE ENO = TGT.ENO AND ROUND = 4);
The difference with yours is that the correlated subquery has to be a row-subselect, it has to guarantee zero or one row (and will assign nulls in case of returning zero rows). The EXISTS subquery excludes rows for which the correlated subquery will not return rows.

System Versioned (Temporal) tables in a view

I have a number of joined "system versioned" tables, e.g. Person, PhoneNumber and EmailAddress
The Person will only have one PhoneNumber and one EmailAddress at a time.
The PhoneNumber and EmailAddress will not usually be updated outside of a transaction that updates all 3 at once. (But they can be updated independently, just not in the normal scenario)
E.g. if I change the phone number, then all 3 records will be issued an update in the same transaction, hence giving them all the same "start time" in the history table.
Let's say I insert a person and then change the person's name, email address and phone number in 2 transactions:
DECLARE #Id TABLE(ID INT)
DECLARE #PersonId INT
-- Initial insert
BEGIN TRANSACTION
INSERT INTO Person (Name) OUTPUT inserted.PersonId INTO #Id VALUES ('Homer')
SELECT #PersonId = Id FROM #Id
INSERT INTO EmailAddress (Address, PersonId) VALUES ('homer#fake', #PersonId)
INSERT INTO PhoneNumber (Number, PersonId) VALUES ('999', #PersonId)
COMMIT TRANSACTION
-- Update
WAITFOR DELAY '00:00:02'
BEGIN TRANSACTION
UPDATE Person SET Name = 'Kwyjibo' WHERE PersonID = #PersonId
UPDATE EmailAddress SET Address = 'kwyjibo#fake' WHERE PersonID = #PersonId
UPDATE PhoneNumber SET Number = '000' WHERE PersonID = #PersonId
COMMIT TRANSACTION
Now I select from the view (just an inner join of the tables) using a temporal query:
SELECT * FROM vwPerson FOR SYSTEM_TIME ALL
WHERE PersonId = #PersonId
ORDER BY SysStartTime DESC
And I get returned a row for every combination of edit!
How can I query this view (if at all possible) to only return 1 row for the updates that were made in the same transaction?
I could add a WHERE clause to match all the SysStartTimes, however that would exclude those cases where a table was updated independently of the other 2.
Because of the independent updates, you actually first have to "reconstruct" a timeline, onto which you can join the data. A "sketch" of this follows, obviously don't have your actual table defns so untested:
;WITH AllTimes as (
SELECT PersonId,SysStartTime as ATime FROM Person
UNION
SELECT PersonId,SysEndTime FROM Person
UNION
SELECT PersonId,SysStartTime FROM EmailAddress
UNION
SELECT PersonId,SysEndTime FROM EmailAddress
UNION
SELECT PersonId,SysStartTime FROM PhoneNumber
UNION
SELECT PersonId,SysEndTime FROM PhoneNumber
), Ordered as (
SELECT
PersonId, ATime, ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY Atime) rn
FROM
AllTimes
), Intervals as (
SELECT
p1.PersonId,
o1.ATime as StartTime,
o2.ATime as EndTime
FROM
Ordered o1
inner join
Ordered o2
on
o1.PersonId = o2.PersonId and
o1.rn = o2.rn - 1
)
SELECT
* --TODO - Columns
FROM
Intervals i
inner join
Person p
on
i.PersonId = p.PersonId and
i.StartTime < p.SysEndTime and
p.SysStartTime < i.EndTime
inner join
Email e
on
i.PersonId = e.PersonId and
i.StartTime < e.SysEndTime and
e.SysStartTime < i.EndTime
inner join
PhoneNumber pn
on
i.PersonId = pn.PersonId and
i.StartTime < pn.SysEndTime and
pn.SysStartTime < i.EndTime
With appropriate filters if you just want one persons details, the optimizer will hopefully work it out. There may be additional filters for the joins that I've missed out also.
Hopefully you can see how the 3 CTEs construct the timeline. We take advantage of UNION eliminating duplicates in the first one.

SQL Select set of records from one table, join each record to top 1 record of second table matching 1 column, sorted by a column in the second table

This is my first question on here, so I apologize if I break any rules.
Here's the situation. I have a table that lists all the employees and the building to which they are assigned, plus training hours, with ssn as the id column, I have another table that list all the employees in the company, also with ssn, but including name, and other personal data. The second table contains multiple records for each employee, at different points in time. What I need to do is select all the records in the first table from a certain building, then get the most recent name from the second table, plus allow the result set to be sorted by any of the columns returned.
I have this in place, and it works fine, it is just very slow.
A very simplified version of the tables are:
table1 (ssn CHAR(9), buildingNumber CHAR(7), trainingHours(DEC(5,2)) (7200 rows)
table2 (ssn CHAR(9), fName VARCHAR(20), lName VARCHAR(20), sequence INT) (708,000 rows)
The sequence column in table 2 is a number that corresponds to a predetermined date to enter these records, the higher number, the more recent the entry. It is common/expected that each employee has several records. But several may not have the most recent(i.e. '8').
My SProc is:
#BuildingNumber CHAR(7), #SortField VARCHAR(25)
BEGIN
DECLARE #returnValue TABLE(ssn CHAR(9), buildingNumber CAHR(7), fname VARCHAR(20), lName VARCHAR(20), rowNumber INT)
INSERT INTO #returnValue(...)
SELECT(ssn,buildingNum,fname,lname,rowNum)
FROM SELECT(...,CASE #SortField Row_Number() OVER (PARTITION BY buildingNumber ORDER BY {sortField column} END AS RowNumber)
FROM table1 a
OUTER APPLY(SELECT TOP 1 fName,lName FROM table2 WHERE ssn = a.ssn ORDER BY sequence DESC) AS e
where buildingNumber = #BuildingNumber
SELECT * from #returnValue ORDER BY RowNumber
END
I have indexes for the following:
table1: buildingNumber(non-unique,nonclustered)
table2: sequence_ssn(unique,nonclustered)
Like I said this gets me the correct result set, but it is rather slow. Is there a better way to go about doing this?
It's not possible to change the database structure or the way table 2 operates. Trust me if it were it would be done. Are there any indexes I could make that would help speed this up?
I've looked at the execution plans, and it has a clustered index scan on table 2(18%), then a compute scalar(0%), then an eager spool(59%), then a filter(0%), then top n sort(14%).
That's 78% of the execution so I know it's in the section to get the names, just not sure of a better(faster) way to do it.
The reason I'm asking is that table 1 needs to be updated with current data. This is done through a webpage with a radgrid control. It has a range, start index, all that, and it takes forever for the users to update their data.
I can change how the update process is done, but I thought I'd ask about the query first.
Thanks in advance.
I would approach this with window functions. The idea is to assign a sequence number to records in the table with duplicates (I think table2), such as the most recent records have a value of 1. Then just select this as the most recent record:
select t1.*, t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
My second suggestion is to use a user-defined function rather than a stored procedure:
create function XXX (
#BuildingNumber int
)
returns table as
return (
select t1.ssn, t1.buildingNum, t2.fname, t2.lname, rowNum
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
);
(This doesn't have the logic for the ordering because that doesn't seem to be the central focus of the question.)
You can then call it as:
select *
from dbo.XXX(<building number>);
EDIT:
The following may speed it up further, because you are only selecting a small(ish) subset of the employees:
select *
from (select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber
) t
where seqnum = 1;
And, finally, I suspect that the following might be the fastest:
select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber and
t2.sequence = (select max(sequence) from table2 t2a where t2a.ssn = t1.ssn)
In all these cases, an index on table2(ssn, sequence) should help performance.
Try using some temp tables instead of the table variables. Not sure what kind of system you are working on, but I have had pretty good luck. Temp tables actually write to the drive so you wont be holding and processing so much in memory. Depending on other system usage this might do the trick.
Simple define the temp table using #Tablename instead of #Tablename. Put the name sorting subquery in a temp table before everything else fires off and make a join to it.
Just make sure to drop the table at the end. It will drop the table at the end of the SP when it disconnects, but it is a good idea to make tell it to drop to be on the safe side.

How can I get this query to return 0 instead of null?

I have this query:
SELECT (SUM(tblTransaction.AmountPaid) - SUM(tblTransaction.AmountCharged)) AS TenantBalance, tblTransaction.TenantID
FROM tblTransaction
GROUP BY tblTransaction.TenantID
But there's a problem with it; there are other TenantID's that don't have transactions and I want to get those too.
For example, the transaction table has 3 rows for bob, 2 row for john and none for jane. I want it to return the sum for bob and john AND return 0 for jane. (or possibly null if there's no other way)
How can I do this?
Tables are like this:
Tenants
ID
Other Data
Transactions
ID
TenantID (fk to Tenants)
Other Data
(You didn't state your sql engine, so I'm going to link to the MySQL documentation).
This is pretty much exactly what the COALESCE() function is meant for. You can feed it a list, and it'll return the first non-null value in the list. You would use this in your query as follows:
SELECT COALESCE((SUM(tr.AmountPaid) - SUM(tr.AmountCharged)), 0) AS TenantBalance, te.ID
FROM tblTenant AS te
LEFT JOIN tblTransaction AS tr ON (tr.TenantID = te.ID)
GROUP BY te.ID;
That way, if the SUM() result would be NULL, it's replaced with zero.
Edited: I rewrote the query using a LEFT JOIN as well as the COALESCE(), I think this is the key of what you were missing originally. If you only select from the Transactions table, there is no way to get information about things not in the table. However, by using a left join from the Tenants table, you should get a row for every existing tenant.
Below is a full walkthrough of the problem. The function isnull has also been included to ensure that a balance of zero (rather than null) is returned for Tenants with no transactions.
create table tblTenant
(
ID int identity(1,1) primary key not null,
Name varchar(100)
);
create table tblTransaction
(
ID int identity(1,1) primary key not null,
tblTenantID int,
AmountPaid money,
AmountCharged money
);
insert into tblTenant(Name)
select 'bob' union all select 'Jane' union all select 'john';
insert into tblTransaction(tblTenantID,AmountPaid, AmountCharged)
select 1,5.00,10.00
union all
select 1,10.00,10.00
union all
select 1,10.00,10.00
union all
select 2,10.00,15.00
union all
select 2,15.00,15.00
select * from tblTenant
select * from tblTransaction
SELECT
tenant.ID,
tenant.Name,
isnull(SUM(Trans.AmountPaid) - SUM(Trans.AmountCharged),0) AS Balance
FROM tblTenant tenant
LEFT JOIN tblTransaction Trans ON
tenant.ID = Trans.tblTenantID
GROUP BY tenant.ID, tenant.Name;
drop table tblTenant;
drop table tblTransaction;
Select Tenants.ID, ISNULL((SUM(tblTransaction.AmountPaid) - SUM(tblTransaction.AmountCharged)), 0) AS TenantBalance
From Tenants
Left Outer Join Transactions Tenants.ID = Transactions.TenantID
Group By Tenents.ID
I didn't syntax check it but it is close enough.
SELECT (SUM(ISNULL(tblTransaction.AmountPaid, 0))
- SUM(ISNULL(tblTransaction.AmountCharged, 0))) AS TenantBalance
, tblTransaction.TenantID
FROM tblTransaction
GROUP BY tblTransaction.TenantID
I only added this because if you're intention is to take into account for one of the parts being null you'll need to do the ISNULL separately
Actually, I found an answer:
SELECT tenant.ID, ISNULL(SUM(trans.AmountPaid) - SUM(trans.AmountCharged),0) AS Balance FROM tblTenant tenant
LEFT JOIN tblTransaction trans
ON tenant.ID = trans.TenantID
GROUP BY tenant.ID

Resources