Database Schema question - database

I am having a difficult time deciding how to handle a business requirement in my database schema. I have a lot of tables in the database, but there are only three I need to deal with for this problem: Courses, PersonnelCourses, and Personnel.
Courses is a list of Courses
Personel
is a list of Personnel
PersonnelCourses is a list of Courses
that Personnel have taken.
In courses there is a column called Universal. If a course is universal, that means all Personnel must take that course.
I need to generate a list of all the universal courses that Personnel must take, but the only way I am able to generate this list is with a cross join / cartesian join:
select P.LastName, C.Name
from Courses C, Personnel P
where Universal = 1
From that I want to do a left join onto PersonnelCourses so that I can have a list of all the Personnel and the Courses they must take as well as the courses they have taken. I'm thinking this would all be easier if there was a many to many table between Personnel and Courses. But if all Personnel are going to be in this middle table anyway, isn't that a bit redundant?
Is there a better way to handle this?
Much appreciated,
-Matt

There is a list of courses that everybody has to take. Why not just take this list and work with it, instead of repeating the same list for every personnel row? I don't understand why you are trying to multiply your result set.

Isn't your PersonnelCourses establishing a many to many relationship between a Persons and Courses? If it isn't then I am not sure, If it is then...
select *
from Personnel_Courses
inner join Person on... /*get the Person details*/
inner join Courses on... /*get the Course details*/
where Course.Universal = 1 and Person.Id = #Id
would tell you what universal courses they have taken...
and then
select *
from Courses
where Courses.Universal = 1 and Course.Id not in (
select Course.Id from Personnel_Courses
inner join Person on... /*get the Person details*/
inner join Courses on... /*get the Course details*/
where Course.Universal = 1 and Person.Id = #Id
)
Would give you the universal courses that they haven't taken...
To me it might be easier to do the 2nd in your code (Get the first query results, Do a select from the Course table to get all the universal and then do a comparison...)

This is a topic Database Normalization that books have been written on
and part of why you want to do this is DRY or don't repeat yourself.
So to answer your question about a better way - I would answer no.

How about something like this (using the existing structure)?
SELECT P.LastName, C.Name, 1 as Taken
FROM Courses C
INNER JOIN PersonnelCourses PC ON (C.CourseID=PC.CourseID)
INNER JOIN Personnel P ON (P.PersonID=PC.PersonID)
WHERE(C.Universal = 1)
UNION
SELECTP.LastName, C.name, 0 as Taken
FROM Courses C, Personnel P
WHERE (Universal = 1) and
NOT EXISTS(SELECT * FROM Courses C2
INNER JOIN PersonnelCourses PC2 ON (C.CourseID=PC.CourseID)
INNER JOIN Personnel P2 ON (P.PersonID=PC.PersonID)
WHERE (Universal = 1) and
(PC2.CourseID=C.CourseID) and
(P2.PersonID=PC2.PersonID)
)

Universal is a two-value (boolean) attribute of Course, right? In that case, consider normalizing further. Redesign so that UniversalCourse is a table, not a column on Course. That table would have a single column referencing the course. To find all universal courses, simply select everything from this table. Now you can shorten your cartesian join considerably since you have to multiply Personnel only by the UniversalCourse table, having eliminated the where Universal = 1 clause.

All personnel will not be in the personnelcourses table, only personnel who have taken courses.
I think you design is fine. You just need to tweak your query to get what your after.
In a subquery pull the courses the personnel have taken. Then in an outer query select all the courses that the personnel must take and do a left outer join with the subquery.
Select a.CourseName, b.PersonName from Courses a,
(select P.LastName, C.Name from Courses C, Personnel P, PersonnelCourses pc
c.courseid = pc.courseid and
p.personnelid = pc.personnelid and
c.Universal = 1) b
where
a.courseid += b.courseid order by courseid
It would probably be best to filter by personnel, if this is for a report. That way you would see all of the courses required including the ones taken per person.

Related

Ideal practice for implementing a 1:1 relationship

In a webshop, there are two (relevant to this question) tables: UserSnapshot and Purchase. Upon making a purchase, the user's current information is snapshot so that the purchase records are intact even if the user is later removed or changed. This gives a 1:1 relationship, where each purchase has only one user snapshot, and each user snapshot has only one purchase.
My question is, how should I implement this? Should I have a foreign key pointing to the user snapshot in the purchase table, the other way around, or should I use both (redundant)? Should I combine the two (messy)? Serialise the user snapshot (does not obey 'one value per field')?
I'd suggest looking at the likely queries you want to run, and design your model on that basis.
For instance, I guess you want to know "which orders has this customer placed?". The most natural way of expressing that would be something like:
select *
from customer c
inner join customer_snapshot cs
on c.customer_id = cs.customer_id
inner join orders o
on cs.order_id = o.order_id
where c.customer_id = ?
Or: "What is the current status of the customer who placed this order?".
select *
from order o
inner join customer_snapshot cs
on o.order_id = cs.order_id
inner join customer c
on cs.customer_id = c.customer_id
where o.order_id = ?
This feels natural to me, as it almost uses the customer_snapshot table as a "many to many" joining table.
But that's mostly stylistic - the join could just as easily be on o.customer_snapshot_id = cs.customer_snapshot_id.
How about "how many orders were sent to customers living in city x?"
select *
from order o
inner join customer_snapshot cs
on o.order_id = cs.order_id
inner join customer c
on cs.customer_id = c.customer_id
and cs.city = ?
You don't need "redundant" columns - all queries work without jumping through hoops. You could serialize the snapshot data, but then the "which orders were for customers living in city x" query would be painful.

SQL: Summing columns with a similar column in common

I'm extremely new to SQL Sever and so I apologize if the question is worded strange. I am doing a homework assignment, and this is the question:
"A manager wants to know the email address, number or orders, and the total amount of purchases made by each customer. Create a summary query that returns these three items for each customer that has orders."
I have all of the data queried, the problem is when I pull data from each customer, it will show the quantity of items per order, and I need the items to be pooled together into one column. This is my query thus far (again, total noob, please excuse any poor syntax, etc.)
SELECT EmailAddress,
ItemPrice - DiscountAmount * Quantity AS TotalPurchaseAmount,
COUNT(*) AS OrderQty
FROM Customers
JOIN Orders ON Customers.CustomerID = Orders.CustomerID
JOIN OrderItems ON Orders.OrderID = OrderItems.OrderID
GROUP BY Orders.CustomerID,
OrderItems.ItemPrice, OrderItems.DiscountAmount,
OrderItems.Quantity,
Customers.EmailAddress;
The following is a small bit of the result set that I get:
Email Address OrderTotal OrderQty
allan.sherwood#yahoo.com 253.15 2
allan.sherwood#yahoo.com 839.30 2
allan.sherwood#yahoo.com 1208.16 2
barryz#gmail.com 303.79 4
christineb#solarone.com 479.60 2
david.goldstein#hotmail.com 299.00 2
david.goldstein#hotmail.com 489.30 1
david.goldstein#hotmail.com 479.60 1
So as you can see, I have several orders I need to smoosh together into one single row per e-mail, I have looked and looked for an answer but the only thing I can find is how to find duplicates and ignore them, not combine their data. Any help is extremely appreciate, thanks so much for taking the time to read this :) If my question doesn't make sense please let me know so I can clear up any bad wording I may have used!
Just do GROUP BY CustomerID, EmailAddress:
SELECT
c.EmailAddress,
SUM((i.ItemPrice - i.DiscountAmount) * Quantity) AS TotalPurchaseAmount,
COUNT(*) AS OrderQty
FROM Customers c
INNER JOIN Orders o
ON c.CustomerID = o.CustomerID
INNER JOIN OrderItems i
ON o.OrderID = i.OrderID
GROUP BY
c.CustomerID, c.EmailAddress
Additional note: Use aliases for your tables
You need to change your formula and remove columns that you dont want to group by from select query..
for example your query should be something like this
SELECT EmailAddress,
--do your aggregation here
blah AS TotalPurchaseAmount,
COUNT(*) AS OrderQty
FROM Customers
JOIN Orders ON Customers.CustomerID = Orders.CustomerID
JOIN OrderItems ON Orders.OrderID = OrderItems.OrderID
GROUP BY Orders.CustomerID,
Customers.EmailAddress;

Prevent duplicates sql server

Ok, in this query I'm extracting information from 5 tables, the table Company, Programmer, Tester, Manager and the table Contract. I will extract the Programmers', testers', and Managers' Names and Telephone Numbers, as well as the Company they work on, and this company is responsible for managing this program as a request by x person doesn't matter.
Problem is with the code below, a certain information will come out as many times as there is other information, like a programmer's Name and Tel Number will come out as many times as there are Managers and Testers on the company.
I tried with left outer join and it would give me even more results, so how can I fix this so next time a result won't be duplicated but say NULL?
SELECT DISTINCT pg.name,
pg.Tel_Nr,
Mgr.name,
Mgr.Tel_Nr,
Ts.Name,
Ts.Tel_Nr,
Pg.Name,
con.program_name
FROM Company AS Cm
INNER JOIN Programmer AS Pg ON Pg.company = Cm.name
INNER JOIN Manager AS Mg ON Mg.company = Cm.name
INNER JOIN Tester AS Ts ON Ts.company = Cm.name
INNER JOIN Contract AS Con ON Con.program_name = 'My Program'
AND Cm.name = Con.Company
Surely it would make more sense to produce a list of contact details with perhaps a job description. Something like this:
WITH Cte as (select Cm.name from
Contract as Con join Company as Cm on Cm.name = Con.Company
where Con.program_name = 'My Program')
SELECT pg.name, pg.Tel_Nr, 'Programmer' as JobTitle
FROM Cte INNER JOIN
Programmer as Pg on Pg.company = Cte.name
UNION ALL
SELECT Mgr.name, Mgr.Tel_Nr,'Manager' as JobTitle
FROM Cte INNER JOIN
Manager as Mg on Mg.company= Cte.name
UNION ALL
SELECT Ts.Name, Ts.Tel_Nr, 'Tester' as JobTitle
FROM Cte INNER JOIN
Tester as Ts on Ts.company = Cte.name
This solution deploys a Common Table Expression (labelled Cte) to avoid querying thr Company and Contract tables multiple times. Find out more.

Relational Algebra union, join and intersect

I'm studying computer science and am brushing up on database systems. I'm having difficulties grasping certain parts.
Say I have the following relations:
Lecturers(LecturerID, Name, DeptID)
Course(DeptID, CrsCode, CrsName, Description)
I note that they both share a common attribute, DeptID, therefore they are union-compatible.
How would I go about listing all courses that are taught by lecturers belonging to computer science dept (CS) or electronic engineering dept (eEng)?
My answer would be using intersection with selection. Would the following be correct or near the mark?
πDeptID,CrsName(Course) intersection πDeptID,Name(σDeptID = CS or DeptID = eEng(Lecturers))
I'm sure join could be used here, but I'm unsure how to use the predicate with it.
Thanks for your help. Once I understand what to use in a few situations I'm sure the rest will be easier.
Thanks for any help.
I would use a simple INNER JOIN for this.
SELECT DEPTID, CRSNAME
FROM COURSE A
INNER JOIN LECTURERS B on A.DEPTID=B.DEPTID
WHERE B.DEPTID='eENG' or B.DEPTID='CS'
There must be also a table for Departments, as you have a refference to the DeptID field, which should be an INT. I assume it is DEPARTMENTS with DeptID and Code as fields. In this case:
SELECT
*
FROM
Course C
INNER JOIN
LECTURERS L on C.DeptId = L.DeptID
INNER JOIN
Departments D on C.DeptID = D.DeptID
WHERE
D.code = 'eENG' or D.code = 'CS'

Performance problem on a query

I have a performance problem on a query.
First table is a Customer table which has millions records in it. Customer table has a column of email address and some other information about customer.
Second table is a CommunicationInfo table which contains just Email addresses.
And What I want in here is; how many times the email address in CommunicationInfo table repeats in Customers table. What could be the the most performer query.
The basic query that I can explain this situation is;
Select ci.Email, count(*) from Customer c left join
CommunicationInfo ci on c.Email1 = ci.Email or c.Email2 = ci.Email
Group by ci.Email
But sure, it takes about 5, 6 minutes in execution.
Thanks in Advance.
this query is about as good as it gets if you have an index on Customer.Email and another on CommunicationInfo.Email
Select
c.Email, count(*)
from Customer c
left join CommunicationInfo ci on c.Email1 = ci.Email
left join CommunicationInfo ci2 on c.Email2 = ci2.Email
Group by c.Email
You mention:
And What I want in here is; how many
times the email address in
CommunicationInfo table repeats in
Customers table. What could be the the
most performer query.
To me, that sounds like you could easily use an INNER JOIN - this would most likely be a lot faster, since it will limit the search scope to just those customers who really do have an e-mail - anyone who doesn't have an e-mail at all (and thus a count(*) = 0) will not even be looked at - that might make a big difference even just in the number of rows SQL Server has to count and group.
So try this:
SELECT
ci.Email, COUNT(*)
FROM
dbo.Customer c
INNER JOIN dbo.CommunicationInfo ci
ON c.Email1 = ci.Email OR c.Email2 = ci.Email
GROUP BY
ci.Email
How does that perform in your case??
Using the OR condition robs the optimizer of opportunity to use HASH JOIN or MERGE JOIN.
Use this:
SELECT ci.Email, SUM(cnt)
FROM (
SELECT ci.Email, COUNT(c.Email) AS cnt
FROM CommunicationInfo ci
LEFT JOIN
Customer c
ON c.Email1 = ci.Email
GROUP BY
ci.Email
UNION ALL
SELECT ci.Email, COUNT(c.Email) AS cnt
FROM CommunicationInfo ci
LEFT JOIN
Customer c
ON c.Email2 = ci.Email
GROUP BY
ci.Email
) q2
GROUP BY
ci.Email
or this:
SELECT ci.Email, COUNT(*)
FROM CommunicationInfo ci
LEFT JOIN
(
SELECT Email1 AS email
FROM Customer c
UNION ALL
SELECT Email2
FROM Customer
) q
ON q.Email = ci.Email
GROUP BY
ci.Email
Make sure that you have indexes on Customer(Email) and Customer(Email2)
The first query will be more efficient if your emails are mostly not filled, the second one — if most emails are filled.
Depending on your environment there may not be much you can do to optimize this.
A couple of questions:
How many records in CommunicationInfo?
How often do you really need to run this query? Is it a one time analysis, or are multiple people going to be running this every 10 minutes?
Are the fields indexed? I'll make a guess that neither Email1 nor Email2 field is indexed. However, I wouldn't suggest adding an index without taking the balance of the whole system into consideration.
Why are you using a left join? Do you really need EVERYTHING from the Customer table? You're counting, so no harm in doing an INNER JOIN.
Suggestions:
Run the query through the Query Optimization wizard to see if there is anything SQL Server would recommend.
An extreme suggestion would be to dump the Email1 and Email2 columns into a temp table and join to that. I've seen queries run slowly because of a large amount of stress on a particular table, so sometimes copying the records into a temp table is faster, but this technique is very dependent on how much memory there is, how fast IO is, and the amount of stress on a particular table.

Resources