SELECT from multiple queries - sql-server

I have this tables:
tblDiving(
diving_number int primary key
diving_club int
date_of_diving date)
tblDivingClub(
number int primary key not null check (number>0),
name char(30),
country char(30))
tblWorks_for(
diver_number int
club_number int
end_working_date date)
tblCountry(
name char(30) not null primary key)
I need to write a query to return a name of a country and the number of "Super club" in it.
a Super club is a club which have more than 25 working divers (tblWorks_for.end_working_date is null) or had more than 100 diving's in it(tblDiving) in the last year.
after I get the country and number of super club, I need to show only the country's that contains more than 2 super club.
I wrote this 2 queries:
select tblDivingClub.name,count(distinct tblWorks_for.diver_number) as number_of_guids
from tblWorks_for
inner join tblDivingClub on tblDivingClub.number = tblWorks_for.club_number,tblDiving
where tblWorks_for.end_working_date is null
group by tblDivingClub.name
select tblDivingClub.name, count(distinct tblDiving.diving_number) as number_of_divings
from tblDivingClub
inner join tblDiving on tblDivingClub.number = tblDiving.diving_club
WHERE tblDiving.date_of_diving <= DATEADD(year,-1, GETDATE())
group by tblDivingClub.name
But I don't know how do I continue.
Every query works separately, but how do I combine them and select from them?
It's university assignment and I'm not allowed to use views or temporary tables.
It's my first program so I'm not really sure what I'm doing:)

WITH CTE AS (
select tblDivingClub.name,count(distinct tblWorks_for.diver_number) as diving_number
from tblWorks_for
inner join tblDivingClub on tblDivingClub.number = tblWorks_for.club_number,tblDiving
where tblWorks_for.end_working_date is null
group by tblDivingClub.name
UNION ALL
select tblDivingClub.name, count(distinct tblDiving.diving_number) as diving_number
from tblDivingClub
inner join tblDiving on tblDivingClub.number = tblDiving.diving_club
WHERE tblDiving.date_of_diving <= DATEADD(year,-1, GETDATE())
group by tblDivingClub.name
)
SELECT * FROM CTE
You can combine the queries using a UNION ALL as long as there are the same number of columns in each query. You can then roll them into a Common Table Expression (CTE) and do a select from that.

Related

User Defined Function with a sub query

Create a user defined function named XXRepeatCustomer (where the XX are your initials). The function is to have one input parameter. Use the INT datatype for the input parameter. When the function is executed it is to return a three column table (CustFirstName, CustLastName, and Phone) for customers that placed a number of orders greater than or equal to the number passed in via the input parameter.
In order to receive a total number of orders placed I have joined together the Customer, and CustOrder tables. The problem only wants me to show the first, last, and phone of each customer but not the total of orders. I'm struggling with assigning the #orders parameter, and counting the total amount of orders in the sub query.
CREATE FUNCTION dbo.JERepeatCustomer
(#orders INT)
RETURNS TABLE AS
RETURN (SELECT CustFirstName, CustLastName, Phone
FROM Customer C JOIN CustOrder CO
ON C.CustomerID = CO.CustomerID
WHERE #orders <= OrderID AND OrderID = (SELECT COUNT (DISTINCT OrderID) FROM CustOrder)
GROUP BY CustFirstName, CustLastName, Phone)
I expect the user to enter a 7, or any number, and the results show only the customers who have ordered 7, or more.
The keyword you need is HAVING. HAVING is similar to WHERE. WHERE will filter returned rows based on a specific value in that column, while HAVING will filter rows based on an aggregated value in the column.
For example, you have a customer table, and in your orders table, you have all the orders for each customer.
DECLARE #input INT = 7
SELECT ct.customer, ct.phone, COUNT(ot.orderID)
FROM customertable ct
INNER JOIN ordertable ot
ON ct.customerID = ot.customerID
GROUP BY ct.customer, ct.phone
HAVING COUNT(ot.OrderID) >= #input
SELECT CustFirstName, CustLastName, Phone
FROM Customer C
CORSS APPLY (
SELECT COUNT(*) AS Orders
FROM CustOrder CO
WHERE C.CustomerID = CO.CustomerID) CustomerOrders
Where CustomerOrders.Orders >= #orders

TSQL - De-duplication report - grouping

So I'm trying to create a report that ranks a duplicate record, the idea behind this is that the customer wants to merge a whole lot of duplicate records that came about from a migration.
I need the ranking so that my report can show which record should be the "main" record, i.e. the record that will have missing data pulled into it.
The duplicate definition is pretty simple:
If the email addresses are the same then it is always a duplicate, if
the emails do not match, then the first name, surname, and mobile must
match.
The ranking will be based on a whole bunch of columns in the table, so:
email address isn't NULL = 50
phone number isn't NULL = 20
etc.. whichever gets the highest number in the duplicate group becomes the main record. This is where I am having issues, I can't seem to find a way to get an incremental number for each duplicate set. This is some of the code I have so far:
( I took out some of the rank columns in the temp table and CTE expression to shorten it )
DECLARE #tmp_Duplicates TABLE (
tmp_personID INT
, tmp_Firstname NVARCHAR(100)
, tmp_Surname NVARCHAR(100)
, tmp_HomeEmail NVARCHAR(300)
, tmp_MobileNumber NVARCHAR(100)
--- Ratings
, tmp_HomeEmail_Rating INT
--- Groupings
, tmp_GroupNumber INT
)
;WITH cteDupes AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY personHomeEmail ORDER BY personID DESC) AS RND,
ROW_NUMBER() OVER(PARTITION BY personHomeEmail ORDER BY personId) AS RNA,
p.personID, p.PersonFirstName, p.PersonSurname, p.PersonHomeEMail
, personMobileTelephone
FROM tblCandidate c INNER JOIN tblPerson p ON c.candidateID = p.personID
)
INSERT INTO #tmp_Duplicates
SELECT PersonID, PersonFirstName, PersonSurname, PersonHomeEMail, personMobileTelephone
, 10, RND
FROM cteDupes
WHERE RNA + RND > 2
ORDER BY personID, PersonFirstName, PersonSurname
SELECT * FROM #tmp_Duplicates
This gives me the results I want, but the group number isn't showing how I need it:
What I need is for each group to be an incremental value:

SQL Server : return value in specific table2 column based on value in table1

I have a query that gets data from 2 tables.
Transaction table contains week_id, customer_id, upc12, sales_dollars
Products table contains upc12, column_1, column_2, column_3
I want my query to return the value in products table, based on what the customer_id is in the transaction table. customer_id = 1 should return column_1, customer_id = 2 should return column_3, etc.
SELECT
t.week_id,
customer_id,
upc12,
p.___________ sum(t.sales_dollars)
FROM
transaction t, products p
WHERE
t.upc_12 = p.upc_12
GROUP BY
t.week_id, customer_id, upc12, p.___________
Sorry if this makes no sense, but my research hasn't been very good, as I don't know how to correctly formulate my question. You probably guessed I'm new to SQL.
Thanks!
Here is one way to do it:
;WITH cte as
(
SELECT
t.week_id,
customer_id,
upc12,
CASE customer_id
WHEN 1 THEN p.Column_1
WHEN 2 THEN p.Column_2
WHEN 3 THEN p.Column_3
END As ColByCustomer,
t.sales_dollars
FROM transaction t
INNER JOIN products p on t.upc_12 = p.upc_12
)
SELECT week_id, customer_id, upc12, ColByCustomer, SUM(sales_dollars)
FROM cte
GROUP BY week_id, customer_id, upc12, ColByCustomer

SQL Select set of records from one table, join each record to top 1 record of second table matching 1 column, sorted by a column in the second table

This is my first question on here, so I apologize if I break any rules.
Here's the situation. I have a table that lists all the employees and the building to which they are assigned, plus training hours, with ssn as the id column, I have another table that list all the employees in the company, also with ssn, but including name, and other personal data. The second table contains multiple records for each employee, at different points in time. What I need to do is select all the records in the first table from a certain building, then get the most recent name from the second table, plus allow the result set to be sorted by any of the columns returned.
I have this in place, and it works fine, it is just very slow.
A very simplified version of the tables are:
table1 (ssn CHAR(9), buildingNumber CHAR(7), trainingHours(DEC(5,2)) (7200 rows)
table2 (ssn CHAR(9), fName VARCHAR(20), lName VARCHAR(20), sequence INT) (708,000 rows)
The sequence column in table 2 is a number that corresponds to a predetermined date to enter these records, the higher number, the more recent the entry. It is common/expected that each employee has several records. But several may not have the most recent(i.e. '8').
My SProc is:
#BuildingNumber CHAR(7), #SortField VARCHAR(25)
BEGIN
DECLARE #returnValue TABLE(ssn CHAR(9), buildingNumber CAHR(7), fname VARCHAR(20), lName VARCHAR(20), rowNumber INT)
INSERT INTO #returnValue(...)
SELECT(ssn,buildingNum,fname,lname,rowNum)
FROM SELECT(...,CASE #SortField Row_Number() OVER (PARTITION BY buildingNumber ORDER BY {sortField column} END AS RowNumber)
FROM table1 a
OUTER APPLY(SELECT TOP 1 fName,lName FROM table2 WHERE ssn = a.ssn ORDER BY sequence DESC) AS e
where buildingNumber = #BuildingNumber
SELECT * from #returnValue ORDER BY RowNumber
END
I have indexes for the following:
table1: buildingNumber(non-unique,nonclustered)
table2: sequence_ssn(unique,nonclustered)
Like I said this gets me the correct result set, but it is rather slow. Is there a better way to go about doing this?
It's not possible to change the database structure or the way table 2 operates. Trust me if it were it would be done. Are there any indexes I could make that would help speed this up?
I've looked at the execution plans, and it has a clustered index scan on table 2(18%), then a compute scalar(0%), then an eager spool(59%), then a filter(0%), then top n sort(14%).
That's 78% of the execution so I know it's in the section to get the names, just not sure of a better(faster) way to do it.
The reason I'm asking is that table 1 needs to be updated with current data. This is done through a webpage with a radgrid control. It has a range, start index, all that, and it takes forever for the users to update their data.
I can change how the update process is done, but I thought I'd ask about the query first.
Thanks in advance.
I would approach this with window functions. The idea is to assign a sequence number to records in the table with duplicates (I think table2), such as the most recent records have a value of 1. Then just select this as the most recent record:
select t1.*, t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
My second suggestion is to use a user-defined function rather than a stored procedure:
create function XXX (
#BuildingNumber int
)
returns table as
return (
select t1.ssn, t1.buildingNum, t2.fname, t2.lname, rowNum
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
);
(This doesn't have the logic for the ordering because that doesn't seem to be the central focus of the question.)
You can then call it as:
select *
from dbo.XXX(<building number>);
EDIT:
The following may speed it up further, because you are only selecting a small(ish) subset of the employees:
select *
from (select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber
) t
where seqnum = 1;
And, finally, I suspect that the following might be the fastest:
select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber and
t2.sequence = (select max(sequence) from table2 t2a where t2a.ssn = t1.ssn)
In all these cases, an index on table2(ssn, sequence) should help performance.
Try using some temp tables instead of the table variables. Not sure what kind of system you are working on, but I have had pretty good luck. Temp tables actually write to the drive so you wont be holding and processing so much in memory. Depending on other system usage this might do the trick.
Simple define the temp table using #Tablename instead of #Tablename. Put the name sorting subquery in a temp table before everything else fires off and make a join to it.
Just make sure to drop the table at the end. It will drop the table at the end of the SP when it disconnects, but it is a good idea to make tell it to drop to be on the safe side.

How can I get this query to return 0 instead of null?

I have this query:
SELECT (SUM(tblTransaction.AmountPaid) - SUM(tblTransaction.AmountCharged)) AS TenantBalance, tblTransaction.TenantID
FROM tblTransaction
GROUP BY tblTransaction.TenantID
But there's a problem with it; there are other TenantID's that don't have transactions and I want to get those too.
For example, the transaction table has 3 rows for bob, 2 row for john and none for jane. I want it to return the sum for bob and john AND return 0 for jane. (or possibly null if there's no other way)
How can I do this?
Tables are like this:
Tenants
ID
Other Data
Transactions
ID
TenantID (fk to Tenants)
Other Data
(You didn't state your sql engine, so I'm going to link to the MySQL documentation).
This is pretty much exactly what the COALESCE() function is meant for. You can feed it a list, and it'll return the first non-null value in the list. You would use this in your query as follows:
SELECT COALESCE((SUM(tr.AmountPaid) - SUM(tr.AmountCharged)), 0) AS TenantBalance, te.ID
FROM tblTenant AS te
LEFT JOIN tblTransaction AS tr ON (tr.TenantID = te.ID)
GROUP BY te.ID;
That way, if the SUM() result would be NULL, it's replaced with zero.
Edited: I rewrote the query using a LEFT JOIN as well as the COALESCE(), I think this is the key of what you were missing originally. If you only select from the Transactions table, there is no way to get information about things not in the table. However, by using a left join from the Tenants table, you should get a row for every existing tenant.
Below is a full walkthrough of the problem. The function isnull has also been included to ensure that a balance of zero (rather than null) is returned for Tenants with no transactions.
create table tblTenant
(
ID int identity(1,1) primary key not null,
Name varchar(100)
);
create table tblTransaction
(
ID int identity(1,1) primary key not null,
tblTenantID int,
AmountPaid money,
AmountCharged money
);
insert into tblTenant(Name)
select 'bob' union all select 'Jane' union all select 'john';
insert into tblTransaction(tblTenantID,AmountPaid, AmountCharged)
select 1,5.00,10.00
union all
select 1,10.00,10.00
union all
select 1,10.00,10.00
union all
select 2,10.00,15.00
union all
select 2,15.00,15.00
select * from tblTenant
select * from tblTransaction
SELECT
tenant.ID,
tenant.Name,
isnull(SUM(Trans.AmountPaid) - SUM(Trans.AmountCharged),0) AS Balance
FROM tblTenant tenant
LEFT JOIN tblTransaction Trans ON
tenant.ID = Trans.tblTenantID
GROUP BY tenant.ID, tenant.Name;
drop table tblTenant;
drop table tblTransaction;
Select Tenants.ID, ISNULL((SUM(tblTransaction.AmountPaid) - SUM(tblTransaction.AmountCharged)), 0) AS TenantBalance
From Tenants
Left Outer Join Transactions Tenants.ID = Transactions.TenantID
Group By Tenents.ID
I didn't syntax check it but it is close enough.
SELECT (SUM(ISNULL(tblTransaction.AmountPaid, 0))
- SUM(ISNULL(tblTransaction.AmountCharged, 0))) AS TenantBalance
, tblTransaction.TenantID
FROM tblTransaction
GROUP BY tblTransaction.TenantID
I only added this because if you're intention is to take into account for one of the parts being null you'll need to do the ISNULL separately
Actually, I found an answer:
SELECT tenant.ID, ISNULL(SUM(trans.AmountPaid) - SUM(trans.AmountCharged),0) AS Balance FROM tblTenant tenant
LEFT JOIN tblTransaction trans
ON tenant.ID = trans.TenantID
GROUP BY tenant.ID

Resources