How to represent data with meaning dependent on other columns?

How to represent data with meaning dependent on other columns? - database

This question is about relational databases like postgresql or oracle. The following is a toy example of my problem. Say I have five tables
Client (id, name)
1 Joe
2 Ted
Factory (id, name, salary)
1 BMW 20
2 Porsche 30
Farm (id, name, salary)
1 Wineyard 10
2 Cattle farm 5
Occupation
1 Farmer
2 Worker
Client_Occupation (client-id, occupation-id, dependent-id)
1 (Joe) 1 (Farmer) 1 (Wineyard)
2 (Ted) 2 (Worker) 2 (Porsche)
To find out how much Joe earns, the sql needs to use data from either the table farms (if the occupation id is 1) or factories (if the occupation id is 2). This is creating very convoluted queries and makes the processing code more complex than it should be.
Is there a better way to structure this data if I do not want to merge the factories and farms tables?
In other words: the table client_ocupation has a conditional relation to either Farm or Factory. Is there a better way to represent this information?
Client
^
|
Factory <- Client_Occupation -> Farm
|
v
Occupation

This comes up often when modeling OO hierarchies, and is a table inheritance pattern. Generally, I prefer Concrete Table Inheritance over Single Table Inheritance, but they both have their use cases.
Single Table Inheritance is really easy - just take your Farm and Factory tables and merge them together for a superset. Done and done. Unfortunately, those niceties things like constraints become difficult, and you'll find yourself writing a lot of CASE expressions on the discriminator column.
Concrete Table Inheritance takes a little time to grok, but is actually pretty simple as well. You design a base table w/common attributes and a discriminator column, and then "sub" tables w/specific attributes for each level of the hierarchy. The sub tables are linked back to the base table by Id and Discriminator - which provides some query optimizer gains and prevents a single entity from being multiple types.
For your example problem, you really don't have any specific attributes on the leaves - so this will look a little odd, but the pattern still works (note that I'm renaming "dependent-id" to "location-id" to make it more understandable):
Occupation (id, name)
1 Farmer
2 Worker
# the base table for all locations w/common attributes
Location (id, type, name, salary)
# unique constraint on discriminator column, and another on the combination of id and type so that we can reference in a foreign key
PK(id), UC(type), UC(id, type)
1 Farm Wineyard 10
2 Farm Cattle farm 5
3 Factory BMW 20
4 Factory Porsche 30
Client_Occupation (client-id, occupation-id, location-id)
FK location-id => location.id
1 (Joe) 1 (Farmer) 3 (Wineyard)
2 (Ted) 2 (Worker) 1 (Porsche)
Farm
# carry the discriminator column and check constraint it; any Farm specific columns can be added here
PK(id), FK(id, type) => location(id, type), CHECK type = 'Farm'
Factory (id, type)
1 Farm
2 Farm
Factory
# carry the discriminator column and check constraint it; any Factory specific columns can be added here
PK(id), FK(id, type) => location(id, type), CHECK type = 'Factory'
Factory (id, type)
3 Factory
4 Factory
You can now easily get salary for everything applicable:
SELECT * FROM Client_Occupation CO JOIN Location L ON CO.location-id = L.id
And, if you want to get specific Farm columns, or limit it to Farm locations:
SELECT * FROM Client_Occupation CO JOIN Location L ON CO.location-id = L.id
JOIN Farm F ON CO.id = F.id and CO.type = F.type // technically, joining on type is unnecessary, but I prefer to include it anyway
And, if you want to emulate the Single Table inheritance model:
SELECT * FROM Client_Occupation CO JOIN Location L ON CO.location-id = L.id
LEFT OUTER JOIN Farm F ON CO.id = F.id and CO.type = F.type
LEFT OUTER JOIN Factory T ON CO.id = T.id and CO.type = T.type
On occasion, it's difficult to refactor an existing table design into this pattern. In that case, a view for your base table provides a lot of the same querying benefits:
CREATE VIEW Location AS
SELECT id, 'Farm' as type, name, salary FROM Farm
UNION ALL
SELECT id, 'Factory' as type, name, salary FROM Factory
At that point, you will (probably) have conflicting ids (eg., a FarmId = 1 and FactoryId = 1) so you'll need to be diligent about including the type column in any joins.

Ideally, Farm and Factory should be merged to Workplace and then Workplace could have additional attribute called Type which could take the value Farm or Factory. So the table would look somewhat like this:
Workplace
---------
Id Name Salary Type
-- ---- ------ ----
1 BMW 20 Factory
2 Cattle Farm 5 Farm
.
.
Your problem's root is bad design, not a lack of skill to write good enough queries.
Now that it is fixed in the question that you can't merge Farm and Factory, answering your question:
Is there a better way to structure this data if I do not want to merge the factories and farms tables?
No matter however you structure the data, as long as the tables are different, you will have to write the logic to figure out which table to query. So that convolution is inevitable if you keep Farm and Factory separated and follow proper design otherwise. But you can hide this convolution behind the scenes. Create views from the query on tables you have to often join. Now rather than performing query between three tables, you write the query between one table and the view generated from the query on other two.

Related

How to implement many-to-many-to-many database relationship?

I am building a SQLite database and am not sure how to proceed with this scenario.
I'll use a real-world example to explain what I need:
I have a list products that are sold by many stores in various states. Not every Store sells a particular Product at all, and those that do, may only sell it in one State or another. Most stores sell a product in most states, but not all.
For example, let's say I am trying to buy a vacuum cleaner in Hawaii. Joe's Hardware sells vacuums in 18 states, but not in Hawaii. Walmart sells vacuums in Hawaii, but not microwaves. Burger King does not sell vacuums at all, but will give me a Whopper anywhere in the US.
So if I am in Hawaii and search for a vacuum, I should only get Walmart as a result. While other stores may sell vacuums, and may sell in Hawaii, they don't do both but Walmart does.
How do I efficiently create this type of relationship in a relational database (specifically, I am currently using SQLite, but need to be able to convert to MySQL in the future).
Obviously, I would need tables for Product, Store, and State, but I am at a loss on how to create and query the appropriate join tables...
If I, for example, query a certain Product, how would I determine which Store would sell it in a particular State, keeping in mind that Walmart may not sell vacuums in Hawaii, but they do sell tea there?
I understand the basics of 1:1, 1:n, and M:n relationships in RD, but I am not sure how to handle this complexity where there is a many-to-many-to-many situation.
If you could show some SQL statements (or DDL) that demonstrates this, I would be very grateful. Thank you!

An accepted and common way is the utilisation of a table that has a column for referencing the product and another for the store. There's many names for such a table reference table, associative table mapping table to name some.
You want these to be efficient so therefore try to reference by a number which of course has to uniquely identify what it is referencing. With SQLite by default a table has a special column, normally hidden, that is such a unique number. It's the rowid and is typically the most efficient way of accessing rows as SQLite has been designed this common usage in mind.
SQLite allows you to create a column per table that is an alias of the rowid you simple provide the column followed by INTEGER PRIMARY KEY and typically you'd name the column id.
So utilising these the reference table would have a column for the product's id and another for the store's id catering for every combination of product/store.
As an example three tables are created (stores products and a reference/mapping table) the former being populated using :-
CREATE TABLE IF NOT EXISTS _products(id INTEGER PRIMARY KEY, productname TEXT, productcost REAL);
CREATE TABLE IF NOT EXISTS _stores (id INTEGER PRIMARY KEY, storename TEXT);
CREATE TABLE IF NOT EXISTS _product_store_relationships (storereference INTEGER, productreference INTEGER);
INSERT INTO _products (productname,productcost) VALUES
('thingummy',25.30),
('Sky Hook',56.90),
('Tartan Paint',100.34),
('Spirit Level Bubbles - Large', 10.43),
('Spirit Level bubbles - Small',7.77)
;
INSERT INTO _stores (storename) VALUES
('Acme'),
('Shops-R-Them'),
('Harrods'),
('X-Mart')
;
The resultant tables being :-
_product_store_relationships would be empty
Placing products into stores (for example) could be done using :-
-- Build some relationships/references/mappings
INSERT INTO _product_store_relationships VALUES
(2,2), -- Sky Hooks are in Shops-R-Them
(2,4), -- Sky Hooks in x-Mart
(1,3), -- thingummys in Harrods
(1,1), -- and Acme
(1,2), -- and Shops-R-Them
(4,4), -- Spirit Level Bubbles Large in X-Mart
(5,4), -- Spiirit Level Bubble Small in X-Mart
(3,3) -- Tartn paint in Harrods
;
The _product_store_relationships would then be :-
A query such as the following would list the products in stores sorted by store and then product :-
SELECT storename, productname, productcost FROM _stores
JOIN _product_store_relationships ON _stores.id = storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
ORDER BY storename, productname
;
The resultant output being :-
This query will only list stores that have a product name that contains an s or S (as like is typically case sensitive) the output being sorted according to productcost in ASCending order, then storename, then productname:-
SELECT storename, productname, productcost FROM _stores
JOIN _product_store_relationships ON _stores.id = storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
WHERE productname LIKE '%s%'
ORDER BY productcost,storename, productname
;
Output :-
Expanding the above to consider states.
2 new tables states and store_state_reference
Although no real need for a reference table (a store would only be in one state unless you consider a chain of stores to be a store, in which case this would also cope)
The SQL could be :-
CREATE TABLE IF NOT EXISTS _states (id INTEGER PRIMARY KEY, statename TEXT);
INSERT INTO _states (statename) VALUES
('Texas'),
('Ohio'),
('Alabama'),
('Queensland'),
('New South Wales')
;
CREATE TABLE IF NOT EXISTS _store_state_references (storereference, statereference);
INSERT INTO _store_state_references VALUES
(1,1),
(2,5),
(3,1),
(4,3)
;
If the following query were run :-
SELECT storename,productname,productcost,statename
FROM _stores
JOIN _store_state_references ON _stores.id = _store_state_references.storereference
JOIN _states ON _store_state_references.statereference =_states.id
JOIN _product_store_relationships ON _stores.id = _product_store_relationships.storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
WHERE statename = 'Texas' AND productname = 'Sky Hook'
;
The output would be :-
Without the WHERE clause :-
make Stores-R-Them have a presence in all states :-
The following would make Stores-R-Them have a presence in all states :-
INSERT INTO _store_state_references VALUES
(2,1),(2,2),(2,3),(2,4)
;
Now the Sky Hook's in Texas results in :-
Note This just covers the basics of the topic.

You will need to create combine mapping table of product, states and stores as tbl_product_states_stores which will store mapping of products, state and store. The columns will be id, product_id, state_id, stores_id.

SqlServer Many to Many AND

I have 3 (hypothetical) tables.
Photos (a list of photos)
Attributes (things describing the photos)
PhotosToAttributes (a table to link the first 2)
I want to retrieve the Names of all the Photos that have a list of attributes.
For example, all photos that have both dark lighting and are portraits (AttributeID 1 and 2). Or, for example, all photos that have dark lighting, are portraits and were taken at a wedding (AttributeID 1 and 2 and 5). Or any arbitrary number of attributes.
The scale of the database will be maybe 10,000 rows in Photos, 100 Rows in Attributes and 100,000 rows in PhotosToAttributes.
This question: SQL: Many-To-Many table AND query is very close. (I think.) I also read the linked answers about performance. That leads to something like the following. But, how do I get Name instead of PhotoID? And presumably my code (C#) will build this query and adjust the attribute list and count as necessary?
SELECT PhotoID
FROM PhotosToAttributes
WHERE AttributeID IN (1, 2, 5)
GROUP by PhotoID
HAVING COUNT(1) = 3
I'm a bit database illiterate (it's been 20 years since I took a database class); I'm not even sure this is a good way to structure the tables. I wanted to be able to add new attributes and photos at will without changing the data access code.

It is probably a reasonable way to structure the database. An alternate would be to keep all the attributes as a delimited list in a varchar field, but that would lead to performance issues as you search the field.
Your code is close, to take it to the final step you should just join the other two tables like this:
Select p.Name, p.PhotoID
From Photos As p
Join PhotosToAttributes As pta On p.PhotoID = pta.PhotoID
Join Attributes As a On pta.AttributeID = a.AttributeID
Where a.Name In ('Dark Light', 'Portrait', 'Wedding')
Group By p.Name, p.PhotoID
Having Count(*) = 3;
By joining the Attributes table like that it means you can search for attributes by their name, instead of their ID.

For first create view from your joins:
create view vw_PhotosWithAttributes
as
select
p.PhotoId,
a.AttributeID,
p.Name PhotoName,
a.Name AttributeName
from Photos p
inner join PhotosToAttributes pa on p.PhotoId = pa.PhotoId
inner join Attributes a on a.AttributeID = pa.AttributeID
You can easy ask for attribute, name, id but don't forget to properly index field.

how to use case to combine spelling variations of an item in a table in sql

I have two SQL tables, with deviations of the spellings of department names. I'm needing to combine those using case to create one spelling of the location name. Budget_Rc is the only one with same spelling in both tables. Here's an example:
Table-1 table-2
Depart_Name Room_Loc Depart_Name Room_Loc
1. Finance_P1 P144 1. Fin_P1 P1444
2. Budget_Rc R2c 2. Budget_Rc R2c
3. Payroll_P1_2 P1144 3. Finan_P1_1 P1444
4. PR_P1_2 P1140
What I'm needing to achieve is for the department to be 1 entity, with one room location. These should show as one with one room location in the main table (Table-1).
Depart_Name Room_Loc
1. Finance_P1 F144
2. Budget_Rc R2c
3. Payroll_P1_2 P1144
Many many thanks in advance!

I'd first try a
DECLARE #AllSpellings TABLE(DepName VARCHAR(100));
INSERT INTO #AllSpellings(DepName)
SELECT Depart_Name FROM tbl1 GROUP BY Depart_Name
UNION
SELECT Depart_Name FROM tbl2 GROUP BY Depart_Name;
SELECT DepName
FROM #AllSpellings
ORDER BY DepName
This will help you to find all existing values...
Now you create a clean table with all Departments with an IDENTITY ID-column.
Now you have two choices:
In case you cannot change the table's layout
Use the upper select-statement to find all existing entries and create a mapping table, which you can use as indirect link
Better: real FK-relation
Replace the department's names with the ID and let this be a FOREIGN KEY REFERENCE

Can more than one department be in a Room?
If so then its harder and you can't really write a dynamic query without having a list of all the possible one to many relationships such as Finance has the department key of FIN and they have these three names. You will have to define that table to make any sort of relationship.
For instance:
DEPARTMENT TABLE
ID NAME ROOMID
FIN FINANCE P1444
PAY PAYROLL P1140
DEPARTMENTNAMES
ID DEPARTMENTNAME DEPARTMENTID
1 Finance_P1 FIN
2 Payroll_P1_2 PAY
3 Fin_P1 FIN
etc...
This way you can correctly match up all the departments and their names. I would use this match table to get the data organized and normalized before then cleaning up all your data and then just using a singular department name. Its going to be manual but should be one time if you then clean up the data.
If the room is only ever going to belong to one department you can join on the room which makes it a lot easier.

Since there does not appear any solid rule for mapping department names from table one to table two, the way I would approach this is to create a mapping table. This mapping table will relate the two department names.
mapping
Depart_Name_1 | Depart_Name_2
-----------------------------
Finance_P1 | Fin_P1
Budget_Rc | Budget_Rc
Payroll_P1_2 | PR_P1_2
Then, you can do a three-way join to bring everything into a single result set:
SELECT t1.*, t2.*
FROM table1 t1
INNER JOIN mapping m
ON t1.Depart_Name = m.Depart_Name_1
INNER JOIN table2 t2
ON m.Depart_Name_2 = t2.Depart_Name
It may seem tedious to create the mapping table, but it may be unavoidable here. If you can think of a way to automate it, then this could cut down on the time spent there.

Database schema for end user report designer

I'm trying to implement a feature whereby, apart from all the reports that I have in my system, I will allow the end user to create simple reports. (not overly complex reports that involves slicing and dicing across multiple tables with lots of logic)
The user will be able to:
1) Select a base table from a list of allowable tables (e.g., Customers)
2) Select multiple sub tables (e.g., Address table, with AddressId as the field to link Customers to Address)
3) Select the fields from the tables
4) Have basic sorting
Here's the database schema I have current, and I'm quite certain it's far from perfect, so I'm wondering what else I can improve on
AllowableTables table
This table will contain the list of tables that the user can create their custom reports against.
Id Table
----------------------------------
1 Customers
2 Address
3 Orders
4 Products
ReportTemplates table
Id Name MainTable
------------------------------------------------------------------
1 Customer Report #2 Customers
2 Customer Report #3 Customers
ReportTemplateSettings table
Id TemplateId TableName FieldName ColumnHeader ColumnWidth Sequence
-------------------------------------------------------------------------------
1 1 Customer Id Customer S/N 100 1
2 1 Customer Name Full Name 100 2
3 1 Address Address1 Address 1 100 3
I know this isn't complete, but this is what I've come up with so far. Does anyone have any links to a reference design, or have any inputs as to how I can improve this?

This needs a lot of work even though it’s relatively simple task. Here are several other columns you might want to include as well as some other details to take care of.
Store table name along with schema name or store schema name in additional column, add column for sorting and sort order
Create additional table to store child tables that will be used in the report (report id, schema name, table name, column in child table used to join tables, column in parent table used to join tables, join operator (may not be needed if it always =)
Create additional table that will store column names (report id, schema name, table name, column name, display as)
There are probably several more things that will come up after you complete this but hopefully this will get you in the right direction.

Recommended approach to merging two tables

I have a database schema like this:
[Patients] [Referrals]
| |
[PatientInsuranceCarriers] [ReferralInsuranceCarriers]
\ /
[InsuranceCarriers]
PatientInsuranceCarriers and ReferralInsuranceCarriers are identical, except for the fact that they reference either Patients or Referrals. I would like to merge those two tables, so that it looks like this:
[Patients] [Referrals]
\ /
[PatientInsuranceCarriers]
|
[InsuranceCarriers]
I have two options here
either create two new columns - ID_PatientOrReferral + IsPatient (will tell me which table to reference)
or create two different columns - ID_Patient and ID_Referral, both nullable.
Generally, I try to avoid nullable columns, because I consider them a bad practice (meaning, if you can live w/o nulls, then you don't really need a nullable column) and they are more difficult to work with in code (e.g., LINQ to SQL).
However I am not sure if the first option would be a good idea. I saw that it is possible to create two FKs on ID_PatientOrReferral (one for Patients and one for Referrals), though I can't set any update/delete behavior there for obvious reasons, I don't know if constraint check on insert works that way, either, so it looks like the FKs are there only to mark that there are relationships. Alternatively, I may not create any foreign keys, but instead add the relationships in DBML manually.
Is any of the approaches better and why?

To expand on my somewhat terse comment:
I would like to merge those two tables
I believe this would be a bad idea. At the moment you have two tables with good clear relation predicates (briefly, what it means for there to exist a record in the table) - and crucially, these relation predicates are different for the two tables:
A record exists in PatientInsuranceCarriers <=> that Patient is associated with that Insurance Carrier
A record exists in ReferralInsuranceCarriers <=> that Referral is associated with that Insurance Carrier
Sure, they are similar, but they are not the same. Consider now what would be the relation predicate of a combined table:
A record exists in ReferralAndPatientInsuranceCarriers <=> {(IsPatient is true and the Patient with ID ID_PatientOrReferral) or alternatively (IsPatient is false and the Referral with ID ID_PatientOrReferral)} is associated with that Insurance Carrier
or if you do it with NULLs
A record exists in ReferralAndPatientInsuranceCarriers <=> {(ID_Patient is not NULL and the Patient with ID ID_Patient) or alternatively (ID_Referral is not NULL and the Referral with ID ID_Referral)} is associated with that Insurance Carrier
Now, I'm not one to automatically suggest that more complicated relation pedicates are necessarily worse; but I'm fairly sure that either of the two above are worse than those they would replace.
To address your concerns:
we now have two LINQ to SQL entities, separate controllers and views for each
In general I would agree with reducing duplication; however, only duplication of the same things! Here, is it not the case that all the above are essentially 'boilerplate', and their construction and maintenance can be delegated to suitable development tools?
and have to merge them when preparing data for reports
If you were to create a VIEW, containing a UNION, for reporting purposes, you would keep the simplicity of the actual data and still have the ability to report on a combined list; eg (making assumptions about column names etc):
CREATE VIEW InterestingInsuranceCarriers
AS
SELECT
IC.Name InsuranceCarrierName
, P.Name CounterpartyName
, 'Patient' CounterpartyType
FROM InsuranceCarriers IC
INNER JOIN PatientInsuranceCarriers PIC ON IC.ID = PIC.InsuranceCarrierID
INNER JOIN Patient P ON PIC.PatientId = P.ID
UNION
SELECT
IC.Name InsuranceCarrierName
, R.Name CounterpartyName
, 'Referral' CounterpartyType
FROM InsuranceCarriers IC
INNER JOIN ReferralInsuranceCarriers RIC ON IC.ID = RIC.InsuranceCarrierID
INNER JOIN Referral R ON PIC.ReferralId = R.ID

Copying my answer from this question
If you really need A_or_B_ID in TableZ, you have two similar options:
1) Add nullable A_ID and B_ID columns to table z, make A_or_B_ID a computed column using ISNULL on these two columns, and add a CHECK constraint such that only one of A_ID or B_ID is not null
2) Add a TableName column to table z, constrained to contain either A or B. now create A_ID and B_ID as computed columns, which are only non-null when their appropriate table is named (using CASE expression). Make them persisted too
In both cases, you now have A_ID and B_ID columns which can have appropriate foreign
keys to the base tables. The difference is in which columns are computed. Also, you
don't need TableName in option 2 above if the domains of the 2 ID columns don't
overlap - so long as your case expression can determine which domain A_or_B_ID
falls into

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight