Which of these two DB design approaches is better? - database

I have following entities, EntityA, EntityB, EntityC and EntityD:
+----------+ +----------+ +----------+ +----------+
| EntityA | | EntityB | | EntityC | | EntityD |
+----------+ +----------+ +----------+ +----------+
| FC1 | | FC1 | | FC1 | | FC1 |
| FC2 | | FC2 | | FC2 | | FC2 |
| FC3 | | FC3 | | FC3 | | FC3 |
| FC4 | | FC4 | | FC4 | | FC4 |
| EA1 | | EB1 | | EC1 | | ED1 |
| EA2 | | EB2 | | EC2 | | ED2 |
| EA3 | | | | EC3 | | ED3 |
| EA4 | | | | | | ED4 |
+----------+ +----------+ +----------+ +----------+
Each entity has properties FC1, FC2, FC3 and FC4 that are common across all the entities; and some properties are specific to the entity. Also each entity references every other entity in the domain. There is a many-to-many relation between entities.
Which of the following DB designs is better? Or is there any other better approach than the two described below?
1)
+-------------+
| Link |
+-------------+
+---| id_T1(FK) |
+---| id_T2(FK) |
+---------------+ | +-------------+
| TableCommon | |
+---------------+ |
+-->| id(PK) |<----+-------+------------------+------------------+
| | FC1 | | | |
| | FC2 | | | |
| | FC3 | | | |
| | FC4 | | | |
| +---------------+ | | |
| | | |
| +----------+ +----------+ | +----------+ | +----------+ |
| | TableA | | TableB | | | TableC | | | TableD | |
| +----------+ +----------+ | +----------+ | +----------+ |
+---| id(FK) | | id(FK) |--+ | id(FK) |--+ | id(FK) |--+
| EA1 | | EB1 | | EC1 | | ED1 |
| EA2 | | EB2 | | EC2 | | ED2 |
| EA3 | | | | EC3 | | ED3 |
| EA4 | | | | | | ED4 |
+----------+ +----------+ +----------+ +----------+
In this design, the common properties of the entities above are stored in a separate table, the TableCommon; this table can be thought as the base table from which every other tables are derived. The Link table above stores the reference of one entity to other entity in the domain representing many-to-many relationship between entities.
2)
+----------+ +----------+ +----------+ +----------+
| TableA | | TableB | | TableC | | TableD |
+----------+ +----------+ +----------+ +----------+
| id(PK) |<--+ +-->| id(PK) | | id(PK) | | id(PK) |
| FC1 | | | | FC1 | | FC1 | | FC1 |
| FC2 | | | | FC2 | | FC2 | | FC2 |
| FC3 | | | | FC3 | | FC3 | | FC3 |
| FC4 | | | | FC4 | | FC4 | | FC4 |
| EA1 | | | | EB1 | | EC1 | | ED1 |
| EA2 | | | | EB2 | | EC2 | | ED2 |
| EA3 | | | | | | EC3 | | ED3 |
| EA4 | | | | | | | | ED4 |
+----------+ | | +----------+ +----------+ +----------+
| |
+----------+ | | +----------+ +----------+ +----------+
| TableAB | | | | TableAC | | TableAD | | TableBC |
+----------+ | | +----------+ +----------+ +----------+
| id_1(FK) |---+ | | id_1(FK) | | id_1(FK) | | id_1(FK) | ...
| id_2(FK) |------+ | id_2(FK) | | id_2(FK) | | id_2(FK) |
+----------+ +----------+ +----------+ +----------+
In this design each entity is represented by it's own table. The common properties of the entities are not extracted into a separate table. But separate tables are created to represent many-to-many relations between each entity e.g. TableAB represents link between TableA and TableB, similarly TableBC represents link between TableB and TableC and so on. In this case there will be total 6 tables, TableAB, TableAC, TableAD, TableBC, TableBD and TableCD representing many-to-many relations between all the 4 entity tables, TableA, TableB, TableC and TableD.
From the above two designs I can think of following pros and cons for each:
First design:
Pros:
There are less tables created in the design.
Any change in the common properties of the entities has to be made in only one table, the TableCommon.
Adding new entity in the design is easy.
Cons:
All the addition, update and deletion has to be made through a single table, the TableCommon, to maintain referential integrity. That could be a bottleneck.
Adding entries into an entity table has to be done in two tables.
Second Design:
Pros:
Each entity is represented by a separate table hence there is no bottleneck while addition, update and deletion.
Adding entries in an entity table has to be done in a single table.
Cons:
Too many tables are created for storing references between entities.
Adding new entity is cumbersome.
Changing common properties of entities has to be done in all the entity tables.
Which of the above designs is better, or is there some other even better approach? Here better is in terms of performance, storage space, maintainance and scalability.

I think that you answered many part of your question very well. I just notify some other points.
Note 1: about TableCommon strategy
I strongly recommend to use TableCommon to save common fields. It has not many side effects on your evaluation parameters (Performance, Redundancy, Scalability, Maintenance and etc.).
Note 2: about Link table strategy
The parameters that are important here:
The number of records in Entities A,B,C and D
The number of record in many-to-many relationships among them
The number of CRUDs from these many-to-many relationships
If you have a lot of records in them and you have a lot of CRUDs and the performance of them is vital, you should not use Link table strategy.
However, If you have only two or more tables (for example EntityA and EntityB) that have a lot of records in many-to-many relationship, you can use EntityAB strategy only for them and use Link table strategy for the others.
Note 3: using Fact Table between Entity A,B,C and D
I know that it is very bad design at the first look.
But, based on the evaluation parameters, it can be useful in some cases.
Using Fact Table like this:
gathering F.Ks of all Entities A,B,C and D in one table.
First Cons:
If the many-to-many relationships have many other fields, we cannot use this strategy.
There are many bad Nullification in that Fact Table.
Pros:
You can fetch all EntityA relationships in one record.
Reduce the number of entities.
Reduce the number of records.

Related

Left Join Not Returning Expected Results

I have a table of Vendors (Vendors):
+-----+-------------+--+
| ID | Vendor | |
+-----+-------------+--+
| 1 | ABC Company | |
| 2 | DEF Company | |
| 3 | GHI Company | |
| ... | ... | |
+-----+-------------+--+
and a table of services (AllServices):
+-----+------------+--+
| ID | Service | |
+-----+------------+--+
| 1 | Automotive | |
| 2 | Medical | |
| 3 | Financial | |
| ... | ... | |
+-----+------------+--+
and a table that links the two (VendorServices):
+-----------+-----------+
| Vendor ID | ServiceID |
+-----------+-----------+
| 1 | 1 |
| 1 | 3 |
| 3 | 2 |
| ... | ... |
+-----------+-----------+
Note that one company may provide multiple services while some companies may not provide any of the listed services.
The query results I want would be, for a given Vendor:
+------------+----------+
| Service ID | Provided |
+------------+----------+
| 1 | 0 |
| 2 | 0 |
| 3 | 1 |
| ... | ... |
+------------+----------+
Where ALL of the services are listed and the ones that the given vendor provides would have a 1 in the Provided column, otherwise a zero.
Here's what I've got so far:
SELECT
VendorServices.ServiceID,
<Some Function> AS Provided
FROM
AllServices LEFT JOIN VendorServices ON AllServices.ID = VendorServices.ServiceID
WHERE
VendorServices.VendorID = #VendorID
ORDER BY
Service
I have two unknowns:
The above query does not return every entry in the AllServices table; and
I don't know how to write the function for the Preovided column.
You need a LEFT join of AllServices to VendorServices and a case expression to get the column provided:
select s.id,
case when v.serviceid is null then 0 else 1 end provided
from AllServices s left join VendorServices v
on v.serviceid = s.id and v.vendorid = #VendorID
See the demo.

SQL Server: How to unpivot from pivoted table back to a self referencing table

I've looked at examples from: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver15 but I couldn't seem to find samples of what I'm trying to do.
I'm wondering if there's a way to unpivot from this:
+----+------------+--------+--------+--------+
| Id | Level0 | Level1 | Level2 | Level3 |
+----+------------+--------+--------+--------+
| 0 | TMI | | | |
+----+------------+--------+--------+--------+
| 1 | TMI | A | | |
+----+------------+--------+--------+--------+
| 2 | TMI | A | B | |
+----+------------+--------+--------+--------+
| 3 | TMI | A | B | C |
+----+------------+--------+--------+--------+
| 4 | TMI | A | B | D |
+----+------------+--------+--------+--------+
Back to self referencing table like this:
+----+-----------+----------+--------+
| Id | LevelName | ParentId | Level |
+----+-----------+----------+--------+
| 0 | TMI | | Level0 |
+----+-----------+----------+--------+
| 1 | A | 0 | Level1 |
+----+-----------+----------+--------+
| 2 | B | 1 | Level2 |
+----+-----------+----------+--------+
| 3 | C | 2 | Level3 |
+----+-----------+----------+--------+
| 4 | D | 2 | Level3 |
+----+-----------+----------+--------+

compare two tables in a query taking so long time

I have a a query where i compare 350 rows with other tables which are having 50000 rows it takes 5 minutes to give result.Any faster way to retrieve values get earlier.
select lower(rtrim(substring(rt_queue, 3, 6))) + " Ticor - " + convert(varchar(4), sort_grp_id) + " " + rtrim(batch_no) + ".prn" as FILENAME from tprt_queue where cycle_date >= (select CYCLE from tb_jpachi_cycle) and batch_no <> 'JOBCTR-' and rt_queue not like '%CN%' and status = 'TINTED' and lower(rtrim(substring(prt_queue, 3, 6))) + " Bicor - " + convert(varchar(4), sort_grp_id) + " " + rtrim(batch_no) + ".prn" not in (
select FILENAME from tb_jpachi_filesprocess ) order by rt_dt
This one contains 100000 rows : select FILENAME from tb_jpachi_filesprocess
Later it will increase day by day.
Here is query plan :
QUERY PLAN FOR STATEMENT 1 (at line 1).
STEP 1
The type of query is EXECUTE.
Executing a newly cached statement (SSQL_ID = 230588804).
QUERY PLAN FOR STATEMENT 1 (at line 0).
STEP 1
The type of query is DECLARE.
QUERY PLAN FOR STATEMENT 2 (at line 1).
Optimized using Serial Mode
STEP 1
The type of query is SELECT.
10 operator(s) under root
|ROOT:EMIT Operator (VA = 10)
|
| |RESTRICT Operator (VA = 9)(0)(0)(0)(0)(9)
| |
| | |SEQUENCER Operator (VA = 8) has 2 children.
| | |
| | | |SCALAR AGGREGATE Operator (VA = 1)
| | | | Evaluate Ungrouped ONCE AGGREGATE.
| | | |
| | | | |SCAN Operator (VA = 0)
| | | | | FROM TABLE
| | | | | tb_jpachi_cycle
| | | | | Table Scan.
| | | | | Forward Scan.
| | | | | Positioning at start of table.
| | | | | Using I/O Size 2 Kbytes for data pages.
| | | | | With LRU Buffer Replacement Strategy for data pages.
| | |
| | | |SQFILTER Operator (VA = 7) has 2 children.
| | | |
| | | | |RESTRICT Operator (VA = 3)(0)(0)(0)(6)(0)
| | | | |
| | | | | |SCAN Operator (VA = 2)
| | | | | | FROM TABLE
| | | | | | tprt_queue
| | | | | | Using Clustered Index.
| | | | | | Index : pk_tprt_queue
| | | | | | Forward Scan.
| | | | | | Positioning at index start.
| | | | | | Using I/O Size 16 Kbytes for index leaf pages.
| | | | | | With LRU Buffer Replacement Strategy for index leaf pages.
| | | | | | Using I/O Size 16 Kbytes for data pages.
| | | | | | With LRU Buffer Replacement Strategy for data pages.
| | | |
| | | | Run subquery 1 (at nesting level 1).
| | | |
| | | | QUERY PLAN FOR SUBQUERY 1 (at nesting level 1 and at line 2).
| | | |
| | | | Correlated Subquery.
| | | | Subquery under an IN predicate.
| | | |
| | | | |SCALAR AGGREGATE Operator (VA = 6)
| | | | | Evaluate Ungrouped ANY AGGREGATE.
| | | | | Scanning only up to the first qualifying row.
| | | | |
| | | | | |RESTRICT Operator (VA = 5)(9)(0)(0)(15)(0)
| | | | | |
| | | | | | |SCAN Operator (VA = 4)
| | | | | | | FROM TABLE
| | | | | | | tb_jpachi_filesprocess
| | | | | | | Table Scan.
| | | | | | | Forward Scan.
| | | | | | | Positioning at start of table.
| | | | | | | Using I/O Size 16 Kbytes for data pages.
| | | | | | | With LRU Buffer Replacement Strategy for data pages.
| | | |
| | | | END OF QUERY PLAN FOR SUBQUERY 1.

How to display "The Many" of a One-to-Many on One Line

I was provided 2 files, as two tables: 'VoterData' and 'VoterHistory' - What is the best way to accomplish my expected display?
EXPECTED DISPLAY
ID | First Name | Last Name | Election1 | Election2 | Election3
--------+------------+-------------+-----------+-----------+----------
2155077 | Camille | Bocchicchio | 2016June7 | 2016Nov8 | 2018June5
2155079 | Manabu | Lonny | 2016June7 | 2016Nov8 |
2155083 | Scott | Bosomworth | 2016June7 | | 2018June5
ONE- 'VoterData'
lVoterUniqueID | szNameFirst | szNameLast
---------------+-------------+------------
2155077 | Camille | Bocchicchio
2155079 | Manabu | Lonny
2155083 | Scott | Bosomworth
MANY- 'VoterHistory'
lVoterUniqueID | sElectionAbbr
---------------+---------------
2155077 | 2016June7
2155077 | 2016Nov8
2155077 | 2018June5
2155079 | 2016June7
2155079 | 2016Nov8
2155083 | 2016June7
2155083 | 2018June5
Using Crosstab query
TRANSFORM First(H.sElectionAbbr) AS FirstOfsElectionAbbr
SELECT H.lVoterUniqueID AS ID, D.szNameFirst AS [First Name], D.szNameLast AS [Last Name]
FROM VoterData AS D INNER JOIN VoterHistory AS H ON D.lVoterUniqueID = H.lVoterUniqueID
GROUP BY H.lVoterUniqueID, D.szNameFirst, D.szNameLast
PIVOT H.sElectionAbbr;

Connecting two tables

I'm designing a database for a Cab/Taxi service. There's a table for taxi service details.
*cab_services*
+---------------------+
| SID | Name |
|---------------------|
| S001 | ABC Taxi |
| S002 | XYZ Cabs |
| S003 | MN Taxi |
| S004 | OP Cabs |
|_______|_____________|
And there's another table for locations.
locations
+-----------------------------------+
| LID | Code | Location |
|----------------|------------------|
| L001 | CO | Akarawita |
| L002 | CO | Angamuwa |
| L003 | CO | Batawala |
| L004 | CO | Avissawella |
| L005 | CO | Battaramulla |
| L006 | GQ | Ambepussa |
| L007 | GQ | Bemmulla |
| L008 | GQ | Biyagama |
| L008 | GQ | Alawala |
| L010 | GQ | Andiambalama |
| L011 | GQ | Biyagama IPZ |
| L012 | KT | Bellana |
| L013 | KT | Bolossagama |
| L014 | KT | Bombuwala |
| L015 | KT | Alutgama |
| L016 | KT | Alubomulla |
|_______|________|__________________|
Note that they are categorized according to districts. (CO, GQ, KT) Each district has multiple towns/cities.
A taxi service may be providing their service in multiple districts. And one district may have multiple taxi services. Its sort of a many to many scenario.
I'm trying to connect the cab_services table with the locations table. But I can't figure out how to.
I would have done something like this if only one service was in on district.
+-------+-------------+---------+
| SID | Name | Locs |
|-------+-------------+---------|
| S001 | ABC Taxi | CO |
|_______|_____________|_________|
But like I said before, a service can have many districts.
+-------+-------------+---------+
| SID | Name | Locs |
|-------+-------------+---------|
| S001 | ABC Taxi | CO, KT |
|_______|_____________|_________|
This would violate the 1NF.
I would want to be able to get results in a situation like say, if a user search using a Location name, he should get the cab services in that area.
What changes do I have to do in my database, table structure to accomplish this?
Please let me know if some part is confusing, I'll try my best to clarify it further. I'm pretty bad at explaining things. Thank you.
It seems your connecting table only needs to define the FK columns for the cab_services table and the locations table (i.e., Note what Oded states about duplication). So for example, if "ABC Taxi" is available in ALL "CO" locations then the connecting table would have the following records:
SID | LID
-----------
S001 | L001
S001 | L002
S001 | L003
S001 | L004
S001 | L005
You will have multiple entries on the connecting table.
SID Name Locs
-----------------------
S001 ABC Taxi CO
S001 ABC Taxi KT
This is still in 1NF, though you are duplicating data by having SID and Name in the table.

Resources