Database Design Relational Algebra query

Database Design Relational Algebra query - database

I have this schema:
Suppliers(sid: integer, sname: string, address: string)
Parts(pid: integer, pname: string, color: string)
Catalog(sid: integer, pid: integer, cost: real)
And this task:
Find the sids of suppliers who supply every part.
What I don't understand is why in this solution we don't work with a negation. I was tempted to put C1.pid <> P.pid instead of C1.pid = P.pid in the end. Can someone explain?
SELECT C.sid
FROM Catalog C
WHERE NOT EXISTS (SELECT P.pid
FROM Parts P
WHERE NOT EXISTS (SELECT C1.sid
FROM Catalog C1
WHERE C1.sid = C.sid
AND C1.pid = P.pid))

Let's say you have 2 parts and 1 supplier. The supplier has both parts. If you join on <>, your innermost subquery will get two rows back: one for the Catalog entry for Part #1 (because Part #1 <> Part #2 is true); and one for the Catalog entry for Part #2 (likewise).
Your reasoning isn't entirely off, but the way to do that is not to use an inequality, but rather to use an outer join and test for the missing record on the "outer" table:
SELECT c.sid
FROM catalog c
WHERE NOT EXISTS
(SELECT c1.sid
FROM catalog c1 LEFT JOIN parts p ON c1.pid = p.pid
WHERE c.sid = c1.sid AND p.pid IS NULL)
Personally, I find the nested not exists to be a little confusing and needlessly complex. I would be more likely to solve this problem using count:
SELECT c.sid
FROM catalog c
GROUP BY c.sid
HAVING COUNT (DISTINCT c.pid) = (SELECT COUNT (*) FROM parts)

Related

Why do these SQL Server queries produce different outcomes?

DATABASE: Adventure Works 2017
Query #1:
select
c.CustomerID, c.FirstName, c.EmailAddress,
a.AddressID as addrId, addr.AddressLine1 as address
from
[AdventureWorksLT2017].[SalesLT].[Customer] c
left join
(
[AdventureWorksLT2017].[SalesLT].[CustomerAddress] a
inner join
[AdventureWorksLT2017].[SalesLT].[Address] addr
on addr.AddressID = a.AddressID
)
on c.CustomerID=a.CustomerID
where
c.EmailAddress = 'orlando0#adventure-works.com';
This is the result:
https://i.stack.imgur.com/hJlmQ.png
Query #2
select
c.CustomerID, c.FirstName, c.EmailAddress,
a.AddressID as addrId, addr.AddressLine1 as address
from
[AdventureWorksLT2017].[SalesLT].[Customer] c
left join
[AdventureWorksLT2017].[SalesLT].[CustomerAddress] a
on a.CustomerID = c.CustomerID
inner join
[AdventureWorksLT2017].[SalesLT].[Address] addr
on addr.AddressID = a.AddressID
where
c.EmailAddress = 'orlando0#adventure-works.com';
This is the result of query #2:
https://i.stack.imgur.com/XwOdg.png
The desired result I want is the one from query #1, but I tried with the second query and I thought that it will produce the same result - but ...
Can anyone explain why ?

The answer is very simple, it's all about the join ordering.
Starting with the second version, we do the following steps
Take all Customer
Left join their CustomerAddress, so none of the previous rows have been removed
Inner join Address to steps 1 and 2, which means only rows that already have a match will be in the resultset
Whereas in version one:
Take all Customer
Take all CustomerAddress...
...Inner join Address to step 2 only, which means only CustomerAddress rows that have a match with Address will be in the resultset
But then left join the whole result of 2 and 3, so none of step 1 Customer rows have been removed
This means that the first version will not remove Customer rows which do not have an Address, whereas version two will do so. Version one is more likely the correct intention
Important note:
The parenthesis themselves are not what does this. It's the fact that the inner join is nested between left join and its respective on. In other words, it's the order of the joins that counts.

Interview question help on relatively basic JOIN and subqueries

I was asked to:
Print the following sequence of columns for each plant that only blooms in one type of weather.
WEATHER_TYPE
PLANT_NAME"
Schema
PLANTS (table name)
PLANT_NAME, string, The name of the plant. This is the primary key.
PLANT_SPECIES, sting, The species of the plant.
SEED_DATE, date, The date the seed was planted.
WEATHER (table name)
PLANT_SPECIES, string, The species of the plant.
WEATHER_TYPE, string, The type of weather in which the plant will bloom.
I wrote the script below and tested it against sample input and achieved a desired result. I don't know if this is what is considered a 'printed' result.
Seeking understanding on what I might have missed. How might I make this script 'more efficient' and/or 'better' and/or 'more robust'?
SELECT WEATHER.WEATHER_TYPE, a.PLANT_NAME
FROM (SELECT b.PLANT_NAME, b.PLANT_SPECIES
FROM (SELECT PLANTS.PLANT_NAME, PLANTS.PLANT_SPECIES, PLANTS.SEED_DATE, WEATHER.WEATHER_TYPE
FROM PLANTS JOIN WEATHER
ON PLANTS.PLANT_SPECIES = WEATHER.PLANT_SPECIES) b
GROUP BY b.PLANT_NAME, b.PLANT_SPECIES
HAVING count(*) = 1) a JOIN WEATHER
ON a.PLANT_SPECIES = WEATHER.PLANT_SPECIES
I achieved the expected result in a SQL Server Management Studio window, but not sure if it's the 'printed' result the question-askers are looking for.

I personally consider CTEs easier to read and to debug, compared to nested "Table Expressions", as you have done. I would have done something like:
with
x as (
select p.plant_name
from plants p
join weather w on w.plant_species = p.plant_species
group by p.plant_name
having count(*) = 1
)
select x.plant_name, w.weather_type
from x
join weather w on w.plant_species = x.plant_species

I have to agree with The Impaler in regards to the readability and ease of debugging nested table expressions. As another option to the CTE (which is really the better choice), if you really want to nest things without overthinking it you can use a correlated subquery. It's easier to read, though as your result set grows you'll lose efficiency.
SELECT w.weather_type, p.plant_name
FROM plants p
JOIN weather w
ON w.plant_species = p.plant_species
WHERE (SELECT COUNT(1) FROM dbo.weather WHERE plant_species = w.plant_species) = 1
or with grouping...
SELECT w.weather_type, p.plant_name
FROM plants p
JOIN weather w
ON w.plant_species = p.plant_species
WHERE w.plant_species IN (SELECT plant_species FROM dbo.weather GROUP BY plant_species HAVING COUNT(1) = 1)

SELECT w.weather_type, p.plant_name
FROM plants p
JOIN weather w
ON w.plant_species = p.plant_species
WHERE w.weather_type="Sunny";

T-SQL stored procedure with joins

I have a problem with a stored procedure. I have 3 tables for a mass mailing service and I want to know how many tasks (table - MMProcessItem) I still need to do...
I have these 3 tables:
Here is my select:
SELECT
MMAddress.AddressID, MMProcess.ProcessID
FROM
MMProcess, MMAddress
LEFT OUTER JOIN
(SELECT *
FROM MMProcessItem) Items ON Items.AddressID = MMAddress.AddressID
WHERE
Items.ResultID IS NULL
ORDER BY
ProcessID, AddressID
And my SQL Code is working fine if there is nothing in MMProcessItem table, this is what I get:
But if I send 1 email, like the one with AddressID = 1 and ProcessID = 1, I don't get anymore the 1 record with AddressID = 1 and ProcessID = 2, I should get a total of 3 records, but what i get is a total of 2 records...
Sorry if this is an amateur mistake, im not used to work with t-sql and do these type of things...

Your join to MMProcessItem requires two predicates, one to join to MMProcess, and one to join to MMAddress. You are currently only joining to MMAddress. That means that when you add a record with AddressID = 1 and ProcessID = 1 it removes both records where AddressID = 1, not just the one record where AddressID is 1 and ProcessID is 1.
You could rewrite your query as:
SELECT a.AddressID, p.ProcessID
FROM MMProcess AS p
CROSS JOIN MMAddress AS a
LEFT OUTER JOIN MMProcessItem AS i
ON i.AddressID = a.AddressID
AND i.ProcessID = p.ProcessID
WHERE i.ResultID IS NULL
ORDER BY p.ProcessID, a.AddressID;
Note the use of explicit join syntax, and also aliases for brevity
Since you are using the LEFT JOIN to MMProcessItem solely for the reason of removing records, then you might find that using NOT EXISTS conveys intention better, but more importantly, it can also perform better.
SELECT a.AddressID, p.ProcessID
FROM MMProcess AS p
CROSS JOIN MMAddress AS a
WHERE NOT EXISTS
( SELECT 1
FROM MMProcessItem AS i
WHERE i.AddressID = a.AddressID
AND i.ProcessID = p.ProcessID
)
ORDER BY p.ProcessID, a.AddressID;

Why do I have duplicate records in my JOIN

I am retrieving data from table ProductionReportMetrics where I have column NetRate_QuoteID. Then to that result set I need to get Description column.
And in order to get a Description column, I need to join 3 tables:
NetRate_Quote_Insur_Quote
NetRate_Quote_Insur_Quote_Locat
NetRate_Quote_Insur_Quote_Locat_Liabi
But after that my premium is completely off.
What am I doing wrong here?
SELECT QLL.Description,
QLL.ClassCode,
prm.NetRate_QuoteID,
QL.LocationID,
ISNULL(SUM(premium),0) AS NetWrittenPremium,
MONTH(prm.EffectiveDate) AS EffMonth
FROM ProductionReportMetrics prm
LEFT JOIN NetRate_Quote_Insur_Quote Q
ON prm.NetRate_QuoteID = Q.QuoteID
INNER JOIN NetRate_Quote_Insur_Quote_Locat QL
ON Q.QuoteID = QL.QuoteID
INNER JOIN NetRate_Quote_Insur_Quote_Locat_Liabi QLL
ON QL.LocationID = QLL.LocationID
WHERE YEAR(prm.EffectiveDate) = 2016 AND
CompanyLine = 'Ironshore Insurance Company'
GROUP BY MONTH(prm.EffectiveDate),
QLL.Description,
QLL.ClassCode,
prm.NetRate_QuoteID,
QL.LocationID
I think the problem in this table:
What Am I missing in this Query?
select
ClassCode,
QLL.Description,
sum(Premium)
from ProductionReportMetrics prm
LEFT JOIN NetRate_Quote_Insur_Quote Q ON prm.NetRate_QuoteID = Q.QuoteID
LEFT JOIN NetRate_Quote_Insur_Quote_Locat QL ON Q.QuoteID = QL.QuoteID
LEFT JOIN
(SELECT * FROM NetRate_Quote_Insur_Quote_Locat_Liabi nqI
JOIN ( SELECT LocationID, MAX(ClassCode)
FROM NetRate_Quote_Insur_Quote_Locat_Liabi GROUP BY LocationID ) nqA
ON nqA.LocationID = nqI.LocationID ) QLL ON QLL.LocationID = QL.LocationID
where Year(prm.EffectiveDate) = 2016 AND CompanyLine = 'Ironshore Insurance Company'
GROUP BY Q.QuoteID,QL.QuoteID,QL.LocationID
Now it says
Msg 8156, Level 16, State 1, Line 14
The column 'LocationID' was specified multiple times for 'QLL'.

It looks like DVT basically hit on the answer. The only reason you would get different amounts(i.e. duplicated rows) as a result of a join is that one of the joined tables is not a 1:1 relationship with the primary table.
I would suggest you do a quick check against those tables, looking for table counts.
--this should be your baseline count
SELECT COUNT(*)
FROM ProductionReportMetrics
GROUP BY MONTH(prm.EffectiveDate),
prm.NetRate_QuoteID
--this will be a check against the first joined table.
SELECT COUNT(*)
FROM NetRate_Quote_Insur_Quote Q
WHERE QuoteID IN
(SELECT NetRate_QuoteID
FROM ProductionReportMetrics
GROUP BY MONTH(prm.EffectiveDate),
prm.NetRate_QuoteID)
Basically you will want to do a similar check against each of your joined tables. If any of the joined tables are part of the grouping statement, make sure they are also in the grouping of the count check statement. Also make sure to alter the WHERE clause of the check count statement to use the join clause columns you were using.
Once you find a table that returns the incorrect number of rows, you will have your answer as to what table is causing the problem. Then you will just have to decide how to limit that table down to distinct rows(some type of aggregation).
This advice is really just to show you how to QA this particular query. Break it up into the smallest possible parts. In this case, we know that it is a join that is causing the problem, so take it one join at a time until you find the offender.

Is my Relational Algebra correct?

I have a database assignment which I have to create some relational algebra for two problems. I feel fairly all right with the majority of it, but I just get confused when trying to project attributes out of a table which is joined to another table.
for example is this correct?
Q1) List the details of incidences with no calls made, so that the receptionist knows
which incidents still need to be called in.
RESULT <-- PROJECT<STUDENT.FirstName, STUDENT.LastName, STAFF.FirstName,
STAFF.INCIDENT.LastName, INCIDENT.DateTimeReported,
INCIDENT.NatureOfIllness(SELECTINCIDENT.DecisionMade =
''(Staff RIGHT JOIN<STAFF.StaffID = INCIDENT.StaffID>
(INCIDENT LEFT JOIN<INCIDENT.StudentID = STUDENT.StudentID>(STUDENT))))
The SQL which I am trying to interpret into relational algebra is:
SELECT
s.FirstName, s.LastName, st.FirstName, st.LastName
, i.DateTimeReported, i.NatureOfIllness
FROM Student s
RIGHT JOIN Incident i ON s.StudentID = i.StudentID
LEFT JOIN Staff st ON st.StaffID = i.StaffID
WHERE i.DecisionMade = ''
Any points of advice would be much appreciated.

It's usually (some exceptions apply, of course) easier to read and understand the sql if you write it all with LEFT JOINs:
SELECT s.FirstName, s.LastName, st.FirstName, st.LastName, i.DateTimeReported, i.NatureOfIllness
FROM Incident i
LEFT JOIN Student s ON s.StudentID = i.StudentID
LEFT JOIN Staff st ON st.StaffID = i.StaffID
WHERE i.DecisionMade = ''

Your version seems correct, except for some typos like STAFF.INCIDENT.LastName. Here's my version:
RESULT <---
PROJECT <STUDENT.FirstName, STUDENT.LastName,
STAFF.FirstName, STAFF.LastName,
INCIDENT.DateTimeReported, INCIDENT.NatureOfIllness>
(SELECT <INCIDENT.DecisionMade = ''>
((STUDENT RIGHT JOIN <STUDENT.StudentID = INCIDENT.StudentID> INCIDENT)
LEFT JOIN <INCIDENT.StaffID = STAFF.StaffID> STAFF)