Deconstructing a complex SQL query into multiple queries and joins - sql-server

Could someone please help break up this complex SQL query into individual steps? I am trying to build multiple tables and joins out of this multistep query that feels quite opaque. Thank you!
select
visit_date,
count(vo1.visit_id) as num_visits,
sum(case when co1.person_id is not null then 1 else 0 end) as num_visits_w_cond
from
visit_occurrence vo1
left join
(select distinct
person_id, visit_id, condition_date
from
condition_occurrence
where
condition_id = 12345)) co1 on vo1.person_id = co1.person_id
and vo1.visit_date = co1.condition_date
and vo1.visit_id = co1.visit_id
where
visit_id = 1234
group by
visit_date
order by
visit_date;
Ideally, I'd like to generate a few data tables in the intermediate steps and then join and count at the end but am not sure what this would look like.
EDIT: Thank you for your comments. Regarding clarity of the individual tables:
The first select will query the visit_occurrence table for all visit_ids that match a visit_id # of "1234" and return the distinct person_id, visit_date, and visit_id for all visits. So a person_id, visit_date, visit_id tuple is unique, i.e. the same person_id with a different visit_date or different visit_id does not qualify as a duplicate. Only an identical tuple is a duplicate.
The second select will query the condition_occurrence table for all condition_ids that match a condition_id # of 12345 and return distinct person_id, visit_id, condition_date. So a person_id, visit_id, condition_date tuple is unique, i.e. the same person_id with a different visit_id or condition_date is not a duplicate. Only an identical tuple is a duplicate.
Join table 1 and table 2 on person_id, visit_date = condition_date, visit_id = visit_id. Then count how many distinct person_ids occur on each date.
From table 1, count how many visit_ids are associated with each date.
Hopefully that's more clear? Thank you again for the feedback.

I doubt this will perform any better, but it's close to what you're asking for. I would stay away from temp tables (which is what I inferred from your question).
with cteVO as --common table expression for visit_occurrence
(
select distinct person_id, visit_date, visit_id
from visit_occurrence
where visit_id = 1234
),
cteCO as -- common table expression for condition_occurrence
(
select distinct person_id, visit_id, condition_date
from condition_occurrence
where condition_id = 12345
)
-- Join both CTEs to get the count of person_id and count of visit_id
SELECT cteVO.visit_date, COUNT(cteVO.person_id) AS count_person_id,
COUNT(cteVO.visit_id) AS count_visit_id
FROM cteVO
INNER JOIN cteCO ON cteVO.person_id = cteCO.person_id
AND cteVO.visit_date = cteCO.condition_date
AND cteVO.visit_id = cteCO.visit_id
GROUP BY cteVO.visit_date

Related

SQL Server : return value in specific table2 column based on value in table1

I have a query that gets data from 2 tables.
Transaction table contains week_id, customer_id, upc12, sales_dollars
Products table contains upc12, column_1, column_2, column_3
I want my query to return the value in products table, based on what the customer_id is in the transaction table. customer_id = 1 should return column_1, customer_id = 2 should return column_3, etc.
SELECT
t.week_id,
customer_id,
upc12,
p.___________ sum(t.sales_dollars)
FROM
transaction t, products p
WHERE
t.upc_12 = p.upc_12
GROUP BY
t.week_id, customer_id, upc12, p.___________
Sorry if this makes no sense, but my research hasn't been very good, as I don't know how to correctly formulate my question. You probably guessed I'm new to SQL.
Thanks!
Here is one way to do it:
;WITH cte as
(
SELECT
t.week_id,
customer_id,
upc12,
CASE customer_id
WHEN 1 THEN p.Column_1
WHEN 2 THEN p.Column_2
WHEN 3 THEN p.Column_3
END As ColByCustomer,
t.sales_dollars
FROM transaction t
INNER JOIN products p on t.upc_12 = p.upc_12
)
SELECT week_id, customer_id, upc12, ColByCustomer, SUM(sales_dollars)
FROM cte
GROUP BY week_id, customer_id, upc12, ColByCustomer

SELECT from multiple queries

I have this tables:
tblDiving(
diving_number int primary key
diving_club int
date_of_diving date)
tblDivingClub(
number int primary key not null check (number>0),
name char(30),
country char(30))
tblWorks_for(
diver_number int
club_number int
end_working_date date)
tblCountry(
name char(30) not null primary key)
I need to write a query to return a name of a country and the number of "Super club" in it.
a Super club is a club which have more than 25 working divers (tblWorks_for.end_working_date is null) or had more than 100 diving's in it(tblDiving) in the last year.
after I get the country and number of super club, I need to show only the country's that contains more than 2 super club.
I wrote this 2 queries:
select tblDivingClub.name,count(distinct tblWorks_for.diver_number) as number_of_guids
from tblWorks_for
inner join tblDivingClub on tblDivingClub.number = tblWorks_for.club_number,tblDiving
where tblWorks_for.end_working_date is null
group by tblDivingClub.name
select tblDivingClub.name, count(distinct tblDiving.diving_number) as number_of_divings
from tblDivingClub
inner join tblDiving on tblDivingClub.number = tblDiving.diving_club
WHERE tblDiving.date_of_diving <= DATEADD(year,-1, GETDATE())
group by tblDivingClub.name
But I don't know how do I continue.
Every query works separately, but how do I combine them and select from them?
It's university assignment and I'm not allowed to use views or temporary tables.
It's my first program so I'm not really sure what I'm doing:)
WITH CTE AS (
select tblDivingClub.name,count(distinct tblWorks_for.diver_number) as diving_number
from tblWorks_for
inner join tblDivingClub on tblDivingClub.number = tblWorks_for.club_number,tblDiving
where tblWorks_for.end_working_date is null
group by tblDivingClub.name
UNION ALL
select tblDivingClub.name, count(distinct tblDiving.diving_number) as diving_number
from tblDivingClub
inner join tblDiving on tblDivingClub.number = tblDiving.diving_club
WHERE tblDiving.date_of_diving <= DATEADD(year,-1, GETDATE())
group by tblDivingClub.name
)
SELECT * FROM CTE
You can combine the queries using a UNION ALL as long as there are the same number of columns in each query. You can then roll them into a Common Table Expression (CTE) and do a select from that.

SQL Select set of records from one table, join each record to top 1 record of second table matching 1 column, sorted by a column in the second table

This is my first question on here, so I apologize if I break any rules.
Here's the situation. I have a table that lists all the employees and the building to which they are assigned, plus training hours, with ssn as the id column, I have another table that list all the employees in the company, also with ssn, but including name, and other personal data. The second table contains multiple records for each employee, at different points in time. What I need to do is select all the records in the first table from a certain building, then get the most recent name from the second table, plus allow the result set to be sorted by any of the columns returned.
I have this in place, and it works fine, it is just very slow.
A very simplified version of the tables are:
table1 (ssn CHAR(9), buildingNumber CHAR(7), trainingHours(DEC(5,2)) (7200 rows)
table2 (ssn CHAR(9), fName VARCHAR(20), lName VARCHAR(20), sequence INT) (708,000 rows)
The sequence column in table 2 is a number that corresponds to a predetermined date to enter these records, the higher number, the more recent the entry. It is common/expected that each employee has several records. But several may not have the most recent(i.e. '8').
My SProc is:
#BuildingNumber CHAR(7), #SortField VARCHAR(25)
BEGIN
DECLARE #returnValue TABLE(ssn CHAR(9), buildingNumber CAHR(7), fname VARCHAR(20), lName VARCHAR(20), rowNumber INT)
INSERT INTO #returnValue(...)
SELECT(ssn,buildingNum,fname,lname,rowNum)
FROM SELECT(...,CASE #SortField Row_Number() OVER (PARTITION BY buildingNumber ORDER BY {sortField column} END AS RowNumber)
FROM table1 a
OUTER APPLY(SELECT TOP 1 fName,lName FROM table2 WHERE ssn = a.ssn ORDER BY sequence DESC) AS e
where buildingNumber = #BuildingNumber
SELECT * from #returnValue ORDER BY RowNumber
END
I have indexes for the following:
table1: buildingNumber(non-unique,nonclustered)
table2: sequence_ssn(unique,nonclustered)
Like I said this gets me the correct result set, but it is rather slow. Is there a better way to go about doing this?
It's not possible to change the database structure or the way table 2 operates. Trust me if it were it would be done. Are there any indexes I could make that would help speed this up?
I've looked at the execution plans, and it has a clustered index scan on table 2(18%), then a compute scalar(0%), then an eager spool(59%), then a filter(0%), then top n sort(14%).
That's 78% of the execution so I know it's in the section to get the names, just not sure of a better(faster) way to do it.
The reason I'm asking is that table 1 needs to be updated with current data. This is done through a webpage with a radgrid control. It has a range, start index, all that, and it takes forever for the users to update their data.
I can change how the update process is done, but I thought I'd ask about the query first.
Thanks in advance.
I would approach this with window functions. The idea is to assign a sequence number to records in the table with duplicates (I think table2), such as the most recent records have a value of 1. Then just select this as the most recent record:
select t1.*, t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
My second suggestion is to use a user-defined function rather than a stored procedure:
create function XXX (
#BuildingNumber int
)
returns table as
return (
select t1.ssn, t1.buildingNum, t2.fname, t2.lname, rowNum
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
);
(This doesn't have the logic for the ordering because that doesn't seem to be the central focus of the question.)
You can then call it as:
select *
from dbo.XXX(<building number>);
EDIT:
The following may speed it up further, because you are only selecting a small(ish) subset of the employees:
select *
from (select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber
) t
where seqnum = 1;
And, finally, I suspect that the following might be the fastest:
select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber and
t2.sequence = (select max(sequence) from table2 t2a where t2a.ssn = t1.ssn)
In all these cases, an index on table2(ssn, sequence) should help performance.
Try using some temp tables instead of the table variables. Not sure what kind of system you are working on, but I have had pretty good luck. Temp tables actually write to the drive so you wont be holding and processing so much in memory. Depending on other system usage this might do the trick.
Simple define the temp table using #Tablename instead of #Tablename. Put the name sorting subquery in a temp table before everything else fires off and make a join to it.
Just make sure to drop the table at the end. It will drop the table at the end of the SP when it disconnects, but it is a good idea to make tell it to drop to be on the safe side.

Check for applicable Group query for shopping cart

I have problem for one of discount check condition. I have tables structure as below:
Cart table (id, customerid, productid)
Group table (groupid, groupname, discountamount)
Group Products table (groupproductid, groupid, productid)
While placing an order, there will be multiple items in cart, I want to check those items with top most group if that group consists of all product shopping cart have?
Example:
If group 1 consists 2 products and those two products exists in cart table then group 1 discount should be returned.
please help
It's tricky, without having real table definitions nor sample data. So I've made some up:
create table Carts(
id int,
customerid int,
productid int
)
create table Groups(
groupid int,
groupname int,
discountamount int
)
create table GroupProducts(
groupproductid int,
groupid int,
productid int
)
insert into Carts (id,customerid,productid) values
(1,1,1),
(2,1,2),
(3,1,4),
(4,2,2),
(5,2,3)
insert into Groups (groupid,groupname,discountamount) values
(1,1,10),
(2,2,15),
(3,3,20)
insert into GroupProducts (groupproductid,groupid,productid) values
(1,1,1),
(2,1,5),
(3,2,2),
(4,2,4),
(5,3,2),
(6,3,3)
;With MatchedProducts as (
select
c.customerid,gp.groupid,COUNT(*) as Cnt
from
Carts c
inner join
GroupProducts gp
on
c.productid = gp.productid
group by
c.customerid,gp.groupid
), GroupSizes as (
select groupid,COUNT(*) as Cnt from GroupProducts group by groupid
), MatchingGroups as (
select
mp.*
from
MatchedProducts mp
inner join
GroupSizes gs
on
mp.groupid = gs.groupid and
mp.Cnt = gs.Cnt
)
select * from MatchingGroups
Which produces this result:
customerid groupid Cnt
----------- ----------- -----------
1 2 2
2 3 2
What we're doing here is called "relational division" - if you want to search elsewhere for that term. In my current results, each customer only matches one group - if there are multiple matches, we need some tie-breaking conditions to determine which group to report. I prompted with two suggestions in comments (lowest groupid or highest discountamount). Your response of "added earlier" doesn't help - we don't have a column which contains the addition dates of groups. Rows have no inherent ordering in SQL.
We would do the tie-breaking in the definition of MatchingGroups and the final select:
MatchingGroups as (
select
mp.*,
ROW_NUMBER() OVER (PARTITION BY mp.customerid ORDER BY /*Tie break criteria here */) as rn
from
MatchedProducts mp
inner join
GroupSizes gs
on
mp.groupid = gs.groupid and
mp.Cnt = gs.Cnt
)
select * from MatchingGroups where rn = 1

How can I get this query to return 0 instead of null?

I have this query:
SELECT (SUM(tblTransaction.AmountPaid) - SUM(tblTransaction.AmountCharged)) AS TenantBalance, tblTransaction.TenantID
FROM tblTransaction
GROUP BY tblTransaction.TenantID
But there's a problem with it; there are other TenantID's that don't have transactions and I want to get those too.
For example, the transaction table has 3 rows for bob, 2 row for john and none for jane. I want it to return the sum for bob and john AND return 0 for jane. (or possibly null if there's no other way)
How can I do this?
Tables are like this:
Tenants
ID
Other Data
Transactions
ID
TenantID (fk to Tenants)
Other Data
(You didn't state your sql engine, so I'm going to link to the MySQL documentation).
This is pretty much exactly what the COALESCE() function is meant for. You can feed it a list, and it'll return the first non-null value in the list. You would use this in your query as follows:
SELECT COALESCE((SUM(tr.AmountPaid) - SUM(tr.AmountCharged)), 0) AS TenantBalance, te.ID
FROM tblTenant AS te
LEFT JOIN tblTransaction AS tr ON (tr.TenantID = te.ID)
GROUP BY te.ID;
That way, if the SUM() result would be NULL, it's replaced with zero.
Edited: I rewrote the query using a LEFT JOIN as well as the COALESCE(), I think this is the key of what you were missing originally. If you only select from the Transactions table, there is no way to get information about things not in the table. However, by using a left join from the Tenants table, you should get a row for every existing tenant.
Below is a full walkthrough of the problem. The function isnull has also been included to ensure that a balance of zero (rather than null) is returned for Tenants with no transactions.
create table tblTenant
(
ID int identity(1,1) primary key not null,
Name varchar(100)
);
create table tblTransaction
(
ID int identity(1,1) primary key not null,
tblTenantID int,
AmountPaid money,
AmountCharged money
);
insert into tblTenant(Name)
select 'bob' union all select 'Jane' union all select 'john';
insert into tblTransaction(tblTenantID,AmountPaid, AmountCharged)
select 1,5.00,10.00
union all
select 1,10.00,10.00
union all
select 1,10.00,10.00
union all
select 2,10.00,15.00
union all
select 2,15.00,15.00
select * from tblTenant
select * from tblTransaction
SELECT
tenant.ID,
tenant.Name,
isnull(SUM(Trans.AmountPaid) - SUM(Trans.AmountCharged),0) AS Balance
FROM tblTenant tenant
LEFT JOIN tblTransaction Trans ON
tenant.ID = Trans.tblTenantID
GROUP BY tenant.ID, tenant.Name;
drop table tblTenant;
drop table tblTransaction;
Select Tenants.ID, ISNULL((SUM(tblTransaction.AmountPaid) - SUM(tblTransaction.AmountCharged)), 0) AS TenantBalance
From Tenants
Left Outer Join Transactions Tenants.ID = Transactions.TenantID
Group By Tenents.ID
I didn't syntax check it but it is close enough.
SELECT (SUM(ISNULL(tblTransaction.AmountPaid, 0))
- SUM(ISNULL(tblTransaction.AmountCharged, 0))) AS TenantBalance
, tblTransaction.TenantID
FROM tblTransaction
GROUP BY tblTransaction.TenantID
I only added this because if you're intention is to take into account for one of the parts being null you'll need to do the ISNULL separately
Actually, I found an answer:
SELECT tenant.ID, ISNULL(SUM(trans.AmountPaid) - SUM(trans.AmountCharged),0) AS Balance FROM tblTenant tenant
LEFT JOIN tblTransaction trans
ON tenant.ID = trans.TenantID
GROUP BY tenant.ID

Resources