DB Schema Taxonomy for species : conditional joins or redesign? - sql-server

I need some advice to design a DB Schema. I am working on a project where I need to classify species.
All species belong to a gender, a family, an order, a class, a branch and finaly a kingdom.
But some of them have a subBranch between class and branch. I first gave the species entity a FK pointing to each single taxonomy.
Then I thought about only giving the species entity the "gender FK" and go all the way up from there to get its full taxonomy. It seemed to work but I realised I could not retreive the subBranch for species concerned by it.
In the class entity I have two FK, one for subBranch and one for branch. Depending on the species, the branchId FK exists in the class entity (and then the subBranch FK is null) leading to Branch and then Kingdom. Or the subBranch FK exists and leads to the SubBranch then from there to Branch and finally kindgom.
In SQL, I have something like this for the species view (I commented in english where I am stuck):
SELECT
S.*,
G.LatinName as 'GenderLatinName',
G.Name as 'GenderName',
F.LatinName as 'FamilyLatinName',
F.Name as 'FamilyName',
O.LatinName as 'OrderLatinName',
O.Name as 'OrderName',
C.LatinName as 'ClassLatinName',
C.Name as 'ClassName',
Sb.LatinName as 'SubBranchLatinName',
Sb.Name as 'SubBranchName',
B.LatinName as 'BranchLatinName',
B.Name as 'BranchName',
K.LatinName as 'KingdomLatinName',
K.Name as 'KingdomName'
from Species S
join Gender G on G.Id = S.GenderId
join Family F on F.Id = G.FamilyId
join [Order] O on O.Id = F.OrderId
join Class C on C.Id = O.ClassId
--if class entity has an existing SubBranchId then join SubBranch to it and then the Branch to the SubBranch
-- if C.SubBranchId is not null
-- then join SubBranch on Sb.BranchId on C.BranchId
-- then join Branch on B.Id on Sb.BranchId
--if class entity has no SubBranchId then straightaway join Branch to it
-- else
-- join Branch on B.Id on C.BranchId
join Branch B on B.Id = C.BranchId
join Kingdom K on K.Id = B.KingdomId
I have seen some questions on conditional joins but I could not get it work. I thought about the UNION ALL but the number of columns vary between the two queries as one has an additional field.
Perhaps the schema design needs to be changed.
How could I do?

Your schema is generally fine, apart from 2 rather minor notes:
In taxonomy, it's usually called "Genus", not "Gender";
I strongly suggest to come up with some other name for the Order table. Trust me, if you will have to write any amount of code worth mentioning against a schema like that, you'll curse the day you chose the table name to be the same as this particular reserved keyword. Orders, Order_, OrderT (from "Taxonomy") - anything will do.
As such, the query should be quite simple:
select s.*,
-- Other columns
isnull(sb.LatinName, '(No sub-branch)') as [SubBranchLatinName],
-- The rest of stuff
from Species S
inner join Genus G on G.Id = S.GenusId
inner join Family F on F.Id = G.FamilyId
inner join OrderT O on O.Id = F.OrderId
inner join Class C on C.Id = O.ClassId
inner join Branch B on B.Id = C.BranchId
left join dbo.SubBranch sb on sb.BranchId = b.Id and sb.Id = c.SubBranchId
inner join Kingdom K on K.Id = B.KingdomId
Left join allows you to bring in the table which might contain no rows that match the condition, without losing these rows in the final output.

I think that having a star schema i.e. SPECIES table contain FKs to each taxonomy table would be faster for selects. This will also remove complications of conditional joins and any other logical "anomalies".
If you want to stick with a chain, then there are two ways:
Conditional join:
Example Below
from Species S
join Gender G on G.Id = S.GenderId
join Family F on F.Id = G.FamilyId
join [Order] O on O.Id = F.OrderId
join Class C on C.Id = O.ClassId
LEFT JOIN SubBranch AS SB ON .....
INNER JOIN Branch AS B ON SB.BranchID = B.Id OR C.BranchID = B.Id
This will be slow.
UNION ALL approach would probably be faster. To work around number of columns differences, you will need to add NULL and/or empty string constants in place of SubBranch columns for the query without SubBranch.
Another way is to add a dummy sub branch records for classes without a sub branch. This way there is always a record in SubBranch table, and you do not need conditional joins. I recommend this solution.

Related

SQLite join multiple values from two tables [duplicate]

Is there any difference (performance, best-practice, etc...) between putting a condition in the JOIN clause vs. the WHERE clause?
For example...
-- Condition in JOIN
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND CUS.FirstName = 'John'
-- Condition in WHERE
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE CUS.FirstName = 'John'
Which do you prefer (and perhaps why)?
The relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN process.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
For inner joins I have not really noticed a difference (but as with all performance tuning, you need to check against your database under your conditions).
However where you put the condition makes a huge difference if you are using left or right joins. For instance consider these two queries:
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
The first will give you only those records that have an order dated later than May 15, 2009 thus converting the left join to an inner join.
The second will give those records plus any customers with no orders. The results set is very different depending on where you put the condition. (Select * is for example purposes only, of course you should not use this in production code.)
The exception to this is when you want to see only the records in one table but not the other. Then you use the where clause for the condition not the join.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderID is null
Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference.
I prefer to keep join conditions separate from query restriction conditions.
If you're using OUTER JOIN sometimes it's necessary to put conditions in the join clause.
WHERE will filter after the JOIN has occurred.
Filter on the JOIN to prevent rows from being added during the JOIN process.
I prefer the JOIN to join full tables/Views and then use the WHERE To introduce the predicate of the resulting set.
It feels syntactically cleaner.
I typically see performance increases when filtering on the join. Especially if you can join on indexed columns for both tables. You should be able to cut down on logical reads with most queries doing this too, which is, in a high volume environment, a much better performance indicator than execution time.
I'm always mildly amused when someone shows their SQL benchmarking and they've executed both versions of a sproc 50,000 times at midnight on the dev server and compare the average times.
Agree with 2nd most vote answer that it will make big difference when using LEFT JOIN or RIGHT JOIN. Actually, the two statements below are equivalent. So you can see that AND clause is doing a filter before JOIN while the WHERE clause is doing a filter after JOIN.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN (SELECT * FROM dbo.Orders WHERE OrderDate >'20090515') AS ORD
ON CUS.CustomerID = ORD.CustomerID
Joins are quicker in my opinion when you have a larger table. It really isn't that much of a difference though especially if you are dealing with a rather smaller table. When I first learned about joins, i was told that conditions in joins are just like where clause conditions and that i could use them interchangeably if the where clause was specific about which table to do the condition on.
Putting the condition in the join seems "semantically wrong" to me, as that's not what JOINs are "for". But that's very qualitative.
Additional problem: if you decide to switch from an inner join to, say, a right join, having the condition be inside the JOIN could lead to unexpected results.
It is better to add the condition in the Join. Performance is more important than readability. For large datasets, it matters.

Ideal practice for implementing a 1:1 relationship

In a webshop, there are two (relevant to this question) tables: UserSnapshot and Purchase. Upon making a purchase, the user's current information is snapshot so that the purchase records are intact even if the user is later removed or changed. This gives a 1:1 relationship, where each purchase has only one user snapshot, and each user snapshot has only one purchase.
My question is, how should I implement this? Should I have a foreign key pointing to the user snapshot in the purchase table, the other way around, or should I use both (redundant)? Should I combine the two (messy)? Serialise the user snapshot (does not obey 'one value per field')?
I'd suggest looking at the likely queries you want to run, and design your model on that basis.
For instance, I guess you want to know "which orders has this customer placed?". The most natural way of expressing that would be something like:
select *
from customer c
inner join customer_snapshot cs
on c.customer_id = cs.customer_id
inner join orders o
on cs.order_id = o.order_id
where c.customer_id = ?
Or: "What is the current status of the customer who placed this order?".
select *
from order o
inner join customer_snapshot cs
on o.order_id = cs.order_id
inner join customer c
on cs.customer_id = c.customer_id
where o.order_id = ?
This feels natural to me, as it almost uses the customer_snapshot table as a "many to many" joining table.
But that's mostly stylistic - the join could just as easily be on o.customer_snapshot_id = cs.customer_snapshot_id.
How about "how many orders were sent to customers living in city x?"
select *
from order o
inner join customer_snapshot cs
on o.order_id = cs.order_id
inner join customer c
on cs.customer_id = c.customer_id
and cs.city = ?
You don't need "redundant" columns - all queries work without jumping through hoops. You could serialize the snapshot data, but then the "which orders were for customers living in city x" query would be painful.

Optimize joins from multiple tables

How can I optimize Performance of the below mentioned query when the table structure is as shown in the pic below
Pic Showing The Table Structure
select CounterID, OutletTitle, CounterTitle
from(
select OutletID, Text as OutletTitle
from Outlets as q1
inner join
TranslationTexts as tt
on q1.TitleID=tt.TranslationID
where tt.Locale='ar-SA' and q1.CompanyID=311 and q1.OutletID=8 --Locale & CompanyID & OutletID
) as O
inner join
(
select CounterID, Text as CounterTitle, OutletID
from Counters as q1
inner join
TranslationTexts as tt
on q1.TitleID=tt.TranslationID
where tt.Locale='ar-SA' and q1.OutletID=8 --Locale & OutletID
) as C
on O.OutletID=C.OutletID
You should try this request :
SELECT CounterID, tou.Text as OutletTitle, tco.Text as CounterTitle
FROM Counters as co
INNER JOIN Outlets as ou ON co.OutletID = ou.OutletID
INNER JOIN TranslationTexts as tco on co.TitleID=tco.TranslationID
INNER JOIN TranslationTexts as tou on ou.TitleID=tou.TranslationID
WHERE co.CompanyID=311 and co.OutletID=8 AND tco.Locale='ar-SA' and tou.Locale='ar-SA'
To have much better performance, you could add some indexes on the 3 tables.
This is a different approach. I cannot say about improvement in performance because that depends on a lot of other things, but I believe it is an equivalent version and an easier one to read.
SELECT
C.CounterID
, tt.Text AS OutletTitle
, tt.Text AS CounterTitle
FROM
Outlets AS q1
INNER JOIN TranslationTexts AS tt ON q1.TitleID=tt.TranslationID
INNER JOIN Counters C ON c.OutletID=q1.OutletID
INNER JOIN TranslationTexts AS tt2 ON tt2.TranslationID=tt.TranslationID AND tt2.Locale=tt.Locale
WHERE
tt.Locale='ar-SA' and q1.CompanyID=311 and q1.OutletID=8;
The question is what you want to optimize.. readability (and maintainability) and/or performance ?
Most people have their own 'style' when writing queries. I prefer the one below, but to the server it will probably look the same and most likely the system will have the exact same amount of 'work' to get the data even though it 'looks' different to us humans. I'd suggest to google around a bit and learn how to interpret a Query Plan.
SELECT q2.CounterID,
tt1.Text as OutletTitle,
tt2.Text as CounterTitle
FROM Outlets as q1
INNER JOIN Counters as q2
ON q2.OutletID = q1.OutletID
INNER JOIN TranslationTexts as tt1
ON tt1.TranslationID = q1.TitleID
AND tt1.Locale = 'ar-SA'
INNER JOIN TranslationTexts as tt2
ON tt2.TranslationID = q2.TitleID
AND tt2.Locale = 'ar-SA'
WHERE q1.CompanyID = 311
AND q1.OutletID = 8
On of the things I notice is that you pass both CompanyID and OutletID as filters for the Outlets table. Since OutletID is the primary key of that table I wonder if you really need the filter on CompanyID. At best it will eliminate the record because it's the wrong company, but somehow I'm under the impression that you already know the right CompanyID.
As for performance, I'd advice these indexes
CREATE INDEX idx_Locale ON TranslationTexts (Locale, Translation_id)
CREATE INDEX idx_CompanyID ON Outlets (CompanyID) INCLUDE (TitleID, OutletID)
Most likely you even can make that index on Local a UNIQUE index making it work even better.

How do I build a query that crosses multiple tables?

Here are my tables:
CUSTOMER
Cust_ID (PK)
Name
ORDERS
Order_ID (PK)
Cust_ID (FK)
ORDER_LINE
Order_ID (pk)
Part_ID (FK)
PART
Part_ID (PK)
Part_Description
Now I want to list the customer details, the part number and the description of the parts that each customer ordered.
How do i do this?
Thanks.
You should use "JOIN" using the FK, but from what I see you don't have a foreign key between "ORDERS" and "ORDER_LINE". Are you sure you're not missing something from the table definition, ie: ORDER_LINE should maybe have the ORDER_ID as a FK ?
Hope this helps
You can try something like
SELECT c.*,
p.*
FROM CUSTOMER c INNER JOIN
ORDERS o ON c.Cust_ID = o.Cust_ID INNER JOIN
ORDER_LINE ol ON o.Order_ID = ol.Order_Number INNER JOIN
PART p ON ol.Part_Number = p.Part_Number
Have a look at
Join (SQL)
An SQL join clause combines records from two or more tables in a
database.
SQL Joins
The JOIN keyword is used in an SQL statement to query data from two or
more tables, based on a relationship between certain columns in these
tables.
And for some graphic examples
JOIN Basics
What you need is a simple straightforward JOIN like so:
SELECT
c.Cust_ID,
c.Name,
l.Part_Number,
l.Part_Description
FROM CUSTOMER c
INNER JOIN ORDERS o ON c.Cust_ID = o.Cust_ID
INNER JOIN ORDER_LINE ol ON o.OrdeR_ID = ol.Order_Number
INNER JOIN PART l ON ol.Part_Number = l.Part_Number
You want an SQL "join", such as:
SELECT c.Name, ol.Part_Number, p.Part_Description
FROM Customer AS c
JOIN Orders AS o ON c.Cust_ID = o.Cust_ID
JOIN Order_Line AS ol ON o.Order_ID = ol.Order_Number
JOIN Part AS p ON ol.Part_Number = p.Part_Number
Be aware that without a WHERE clause, this query will return all all parts in all orders for all customers, which will really hammer the network and perform poorly on anything but a tiny database:
WHERE (c.Cust_ID = MyCustomerID)
MySQL join syntax
SQL Server join syntax

Multiple Joins in TSQL

I am trying to JOIN multiple tables to the same value in a table. So I have the table ActivityPartyBase and it has a column PartyId. I want to join COntactId in ContactBase table to PartyId and AccountId in AccountBase table to PartyId. This is the code I am using and it doesn't return anything. If I only join one it works. Any ideas?
SELECT DISTINCT Appointment.ScheduledStart, ActivityPartyBase.ActivityId
, Appointment.ActivityId AS Expr1, ActivityPartyBase.ScheduledStart AS Expr2
, Appointment.Subject, ActivityPartyBase.PartyId, ContactBase.ContactId
, ContactBase.FullName
FROM Appointment
INNER JOIN ActivityPartyBase
ON Appointment.ActivityId = ActivityPartyBase.ActivityId
INNER JOIN AccountBase ON ActivityPartyBase.PartyId = AccountBase.AccountId
LEFT OUTER JOIN ContactBase ON ActivityPartyBase.PartyId = ContactBase.ContactId
ORDER BY Appointment.ScheduledStart DESC
Your inner joins are filtering out results because there is no corresponding record on the joined table. I've always found the easiest way to debug is to "Select *" and use all LEFT JOINs. This will show you everything in your tables that relates to your main table; you should be able to look at your data and figure out what table is missing a record easily at that point.
To confirm that this is just a naming convention mismatch,
INNER JOIN AccountBase ON ActivityPartyBase.PartyId = AccountBase.AccountId
Are PartyID and AccountId the PK/FK?
Given this...
FROM Appointment
INNER JOIN ActivityPartyBase ON Appointment.ActivityId = ActivityPartyBase.ActivityId
INNER JOIN AccountBase ON ActivityPartyBase.PartyId = AccountBase.AccountId
LEFT OUTER JOIN ContactBase ON ActivityPartyBase.PartyId = ContactBase.ContactId
... you state this works (?) ...
FROM Appointment
INNER JOIN ActivityPartyBase ON Appointment.ActivityId = ActivityPartyBase.ActivityId
/* INNER JOIN AccountBase ON ActivityPartyBase.PartyId = AccountBase.AccountId */
/* LEFT OUTER JOIN ContactBase ON ActivityPartyBase.PartyId = ContactBase.ContactId */
Since the LEFT OUTER JOIN won't explicitly cause no results, that won't be your problem. Since the INNER JOIN will cause what you're seeing, we can only deduce that the join condition is incorrect.
In other words, ActivityPartyBase.PartyId is not equal to AccountBase.AccountID.
Are you sure there is data in all three tables in the inner join?
I'm guessing one of your INNER JOINs isn't picking up any data. Start with all 3 joins, then take out one of the joins at a time see which one breaks it. Then look at your join conditions and see which column isn't returning a record.
SOunds to me as if the tables are mutually exclusive. If it is ione table it is not inthe other (poor design). Try left joins to both tables.

Resources