Recommended approach to merging two tables - sql-server

I have a database schema like this:
[Patients] [Referrals]
| |
[PatientInsuranceCarriers] [ReferralInsuranceCarriers]
\ /
[InsuranceCarriers]
PatientInsuranceCarriers and ReferralInsuranceCarriers are identical, except for the fact that they reference either Patients or Referrals. I would like to merge those two tables, so that it looks like this:
[Patients] [Referrals]
\ /
[PatientInsuranceCarriers]
|
[InsuranceCarriers]
I have two options here
either create two new columns - ID_PatientOrReferral + IsPatient (will tell me which table to reference)
or create two different columns - ID_Patient and ID_Referral, both nullable.
Generally, I try to avoid nullable columns, because I consider them a bad practice (meaning, if you can live w/o nulls, then you don't really need a nullable column) and they are more difficult to work with in code (e.g., LINQ to SQL).
However I am not sure if the first option would be a good idea. I saw that it is possible to create two FKs on ID_PatientOrReferral (one for Patients and one for Referrals), though I can't set any update/delete behavior there for obvious reasons, I don't know if constraint check on insert works that way, either, so it looks like the FKs are there only to mark that there are relationships. Alternatively, I may not create any foreign keys, but instead add the relationships in DBML manually.
Is any of the approaches better and why?

To expand on my somewhat terse comment:
I would like to merge those two tables
I believe this would be a bad idea. At the moment you have two tables with good clear relation predicates (briefly, what it means for there to exist a record in the table) - and crucially, these relation predicates are different for the two tables:
A record exists in PatientInsuranceCarriers <=> that Patient is associated with that Insurance Carrier
A record exists in ReferralInsuranceCarriers <=> that Referral is associated with that Insurance Carrier
Sure, they are similar, but they are not the same. Consider now what would be the relation predicate of a combined table:
A record exists in ReferralAndPatientInsuranceCarriers <=> {(IsPatient is true and the Patient with ID ID_PatientOrReferral) or alternatively (IsPatient is false and the Referral with ID ID_PatientOrReferral)} is associated with that Insurance Carrier
or if you do it with NULLs
A record exists in ReferralAndPatientInsuranceCarriers <=> {(ID_Patient is not NULL and the Patient with ID ID_Patient) or alternatively (ID_Referral is not NULL and the Referral with ID ID_Referral)} is associated with that Insurance Carrier
Now, I'm not one to automatically suggest that more complicated relation pedicates are necessarily worse; but I'm fairly sure that either of the two above are worse than those they would replace.
To address your concerns:
we now have two LINQ to SQL entities, separate controllers and views for each
In general I would agree with reducing duplication; however, only duplication of the same things! Here, is it not the case that all the above are essentially 'boilerplate', and their construction and maintenance can be delegated to suitable development tools?
and have to merge them when preparing data for reports
If you were to create a VIEW, containing a UNION, for reporting purposes, you would keep the simplicity of the actual data and still have the ability to report on a combined list; eg (making assumptions about column names etc):
CREATE VIEW InterestingInsuranceCarriers
AS
SELECT
IC.Name InsuranceCarrierName
, P.Name CounterpartyName
, 'Patient' CounterpartyType
FROM InsuranceCarriers IC
INNER JOIN PatientInsuranceCarriers PIC ON IC.ID = PIC.InsuranceCarrierID
INNER JOIN Patient P ON PIC.PatientId = P.ID
UNION
SELECT
IC.Name InsuranceCarrierName
, R.Name CounterpartyName
, 'Referral' CounterpartyType
FROM InsuranceCarriers IC
INNER JOIN ReferralInsuranceCarriers RIC ON IC.ID = RIC.InsuranceCarrierID
INNER JOIN Referral R ON PIC.ReferralId = R.ID

Copying my answer from this question
If you really need A_or_B_ID in TableZ, you have two similar options:
1) Add nullable A_ID and B_ID columns to table z, make A_or_B_ID a computed column using ISNULL on these two columns, and add a CHECK constraint such that only one of A_ID or B_ID is not null
2) Add a TableName column to table z, constrained to contain either A or B. now create A_ID and B_ID as computed columns, which are only non-null when their appropriate table is named (using CASE expression). Make them persisted too
In both cases, you now have A_ID and B_ID columns which can have appropriate foreign
keys to the base tables. The difference is in which columns are computed. Also, you
don't need TableName in option 2 above if the domains of the 2 ID columns don't
overlap - so long as your case expression can determine which domain A_or_B_ID
falls into

Related

Distinct with long columns

I have here some database schema with tables having long fields (in MS-SQL-Server of type "text", in Sybase of type "text" too) and I need to retrieve distinct rows.
The tables looks like
create table node (id int primary key, … a few more fields … data text);
create table ref (id int primary key, node_id int, … a few more fields);
For one row in "node", there may be zero or more rows in "ref".
Now I have a query like
SELECT node.* FROM node, ref WHERE node.id = ref.node_id AND ... some more restrictions.
This query returns duples and triples when there is more than a single row in "ref" for some "node_id".
But I need unique rows!
Using SELECT DISTINCT node.* does not work because of the columns of type "text" :-(
In Sybase there is trick, just add "GROUP BY node.id" to the query, voila! You get unique rows returned.
Is there some similar simple Trick for MS-SQL-Server?
I have already a solution with temporary tables, but this seems to be a lot slower maybe the reason is just because of the larger number of statements transferred to the database?
It looks like you are approaching this problem from the wrong direction. Joins are typically used to expand on keys where relevant data is stored in different tables. So it's no surprise you are getting more than one row per node_id.
In your query, you join the two tables together, but then you ignore everything from ref. It looks like you're just trying to filter out ids from node that are not referenced in ref. If that is the case, then you don't want to use a join. The following will work much better
select *
from node
where id in (
select node_id
from ref
where [any restrictions placed on the ref table go here]
)
and [any restrictions placed on the node table go here]
Furthermore, at the risk of teaching you bad join practices, the same thing can be accomplished they way you were trying to do it originally, but it's more painful to write and it's not good practice
select node.col1, node.col2, ... , node.last_col
FROM node
inner join ref on node.id = ref.node_id
where [some restrictions.]
group by node.col1, node.col2, ... , node.last_col

how to use case to combine spelling variations of an item in a table in sql

I have two SQL tables, with deviations of the spellings of department names. I'm needing to combine those using case to create one spelling of the location name. Budget_Rc is the only one with same spelling in both tables. Here's an example:
Table-1 table-2
Depart_Name Room_Loc Depart_Name Room_Loc
1. Finance_P1 P144 1. Fin_P1 P1444
2. Budget_Rc R2c 2. Budget_Rc R2c
3. Payroll_P1_2 P1144 3. Finan_P1_1 P1444
4. PR_P1_2 P1140
What I'm needing to achieve is for the department to be 1 entity, with one room location. These should show as one with one room location in the main table (Table-1).
Depart_Name Room_Loc
1. Finance_P1 F144
2. Budget_Rc R2c
3. Payroll_P1_2 P1144
Many many thanks in advance!
I'd first try a
DECLARE #AllSpellings TABLE(DepName VARCHAR(100));
INSERT INTO #AllSpellings(DepName)
SELECT Depart_Name FROM tbl1 GROUP BY Depart_Name
UNION
SELECT Depart_Name FROM tbl2 GROUP BY Depart_Name;
SELECT DepName
FROM #AllSpellings
ORDER BY DepName
This will help you to find all existing values...
Now you create a clean table with all Departments with an IDENTITY ID-column.
Now you have two choices:
In case you cannot change the table's layout
Use the upper select-statement to find all existing entries and create a mapping table, which you can use as indirect link
Better: real FK-relation
Replace the department's names with the ID and let this be a FOREIGN KEY REFERENCE
Can more than one department be in a Room?
If so then its harder and you can't really write a dynamic query without having a list of all the possible one to many relationships such as Finance has the department key of FIN and they have these three names. You will have to define that table to make any sort of relationship.
For instance:
DEPARTMENT TABLE
ID NAME ROOMID
FIN FINANCE P1444
PAY PAYROLL P1140
DEPARTMENTNAMES
ID DEPARTMENTNAME DEPARTMENTID
1 Finance_P1 FIN
2 Payroll_P1_2 PAY
3 Fin_P1 FIN
etc...
This way you can correctly match up all the departments and their names. I would use this match table to get the data organized and normalized before then cleaning up all your data and then just using a singular department name. Its going to be manual but should be one time if you then clean up the data.
If the room is only ever going to belong to one department you can join on the room which makes it a lot easier.
Since there does not appear any solid rule for mapping department names from table one to table two, the way I would approach this is to create a mapping table. This mapping table will relate the two department names.
mapping
Depart_Name_1 | Depart_Name_2
-----------------------------
Finance_P1 | Fin_P1
Budget_Rc | Budget_Rc
Payroll_P1_2 | PR_P1_2
Then, you can do a three-way join to bring everything into a single result set:
SELECT t1.*, t2.*
FROM table1 t1
INNER JOIN mapping m
ON t1.Depart_Name = m.Depart_Name_1
INNER JOIN table2 t2
ON m.Depart_Name_2 = t2.Depart_Name
It may seem tedious to create the mapping table, but it may be unavoidable here. If you can think of a way to automate it, then this could cut down on the time spent there.

SQL join following foreign key: statically check that LHS is key-preserved

Often you join two tables following their foreign key, so that the row in the RHS table will always be found. Adding the join does not affect the number of rows affected by the query. For example
create table a (x int not null primary key)
create table b (x int not null primary key, y int not null)
alter table a add foreign key (x) references b (x)
Now, assuming you set up some data in these two tables, you can get a certain number of rows from a:
select x from a
Adding a join to b following the foreign key does not change this:
select a.x from a join b on a.x = b.x
However, that is not true of joins in general, which may filter out some rows or (by Cartesian product) add more:
select a.x from a join b on a.x = b.x and b.y != 42 -- probably gives fewer rows
select a.x from a join b on a.x != b.y -- probably gives more rows
When reading SQL code there is no obvious way to tell whether a join is the key-preserving kind, which may add extra columns but does not change the number of rows returned, or whether it has other effects. Over time I have developed a coding convention which I mostly stick to:
if a key-preserving join, use join
if wanting to filter rows, put the filter condition in the where clause
if wanting more rows, sometimes cross join for Cartesian product is the clearest way
These are usually just style issues, since you can often put a predicate into either the join clause or the where clause, for example.
My question
Is there some way to have these key-preserving joins statically checked by the database server when the query is compiled? I understand that the query optimizer already knows that a join on a foreign key will always find exactly one row in the table pointed to by the foreign key. But I would like to tag it in my SQL code for the benefit of human readers. For example, suppose the new syntax fkjoin is used for a join following a foreign key. Then the following SQL fragments will give errors or not:
a fkjoin b on a.x = b.x -- OK
a fkjoin b on a.x = b.x and b.y = 42 -- "Error, join can fail due to extra predicate"
a fkjoin b on a.x = b.y -- "Error, no foreign key from a.x to b.y"
This would be a useful check for me when writing the SQL, and also when returning to read it later. I understand and accept that changing the foreign keys in the database would change what SQL is legal under this scheme - to me, that is a desired outcome, since if a necessary FK ceases to exist then the key-preserving semantics of the query are no longer guaranteed, and I'd like to find out about it.
Potentially, there could be some external SQL static checker tool that does the work, and special comment syntax could be used rather than a new keyword. The checker tool would need access to the database schema to see what foreign keys exist, but it would not need to actually execute the query.
Is there something that does what I want? I am using MSSQL 2008 R2. (Microsoft SQL Server for the pedantic)
I realize that you are interested in indicating whether a particular join on particular columns is on a FK, or is a restriction, or perhaps is of some other case, or none of the preceding. (And it's not clear what you mean by "success" or "failure" of a join, or its relevance.) Whereas focusing on that information, as explained below, is to miss focusing on more important and fundamental things.
A base table has a "meaning" or "predicate (expression)" that is a fill-in-the-(named-)blanks statement given by the DBA. The names of the blanks of the statement are the columns of the table. Rows that fill in the blanks to make a true proposition about the world go in the table. Rows that fill in the blanks to make a false proposition about the world are left out. Ie a table holds the rows that satisfy its statement. You cannot set a base table to a certain value without knowing its statement, observing the world and putting the appropriate rows into the table. You cannot know about the world from base tables except by knowing its statement and taking present-row propositions to be true and absent-row propositions to be false. Ie you need its statement to use the database.
Notice that the typical syntax for a table declaration looks like a shorthand for its statement:
-- employee [eid] is named [name] and lives at [address] in ...
EMPLOYEE(eid,name,address,...)
You can make bigger statements by putting logic operators AND, OR, AND NOT, EXISTS name, AND condition, etc between/around other statements. If you translate a statement to a relation/SQL expression by converting
a table's statement to its name
AND to JOIN
OR to UNION
AND NOT to EXCEPT/MINUS
EXISTS C,... [...] to SELECT all columns but C,... FROM ...
AND condition to ON/WHERE condition
IMPLIES to SUBSETOF
IFF to =
then you get a relation expression that calculates the rows that make the statement true. (Arguments of UNION & EXCEPT/MINUS need the same columns.) So just as every table holds the rows satisfying its statement, a query expression holds the rows that satisfy its statement. You cannot know about the world from a query result except by knowing its statement and taking its present-row propositions to be true and absent-row propositions to be false. Ie you need its statement to compose or interpret a query. (Observe that this is true regardless of what constraints hold.)
This is the foundation of the relational model: table expressions calculate rows satisfying corresponding statements. (To the extent that SQL differs, it is literally illogical.)
Eg: If table T holds the rows that make statement T(...,T.Ci,...) true and table U holds the rows that make statement U(...,U.Cj,...) true then table T JOIN U holds the rows that make statement T(...,T.Ci,...) AND U(...,U.Cj,...) true. That is the semantics of JOIN that is important to using a database. You can always join, and a join always has a meaning, and it is always the AND of its operands' meanings. Whether any tables happen to have FKs to others just isn't particularly helpful for reasoning about updates or queries. (The DBMS uses constraints for when you make mistakes.)
A constraint expression just corresponds to a proposition aka always-true statement about the world and simultaneusly to one about base tables. Eg for C UNIQUE NOT NULL in U, the following three expressions are equivalent to each other:
FOREIGN KEY T (C) REFERENCES U (C)
EXISTS columns other than C T(...,C,...)
IMPLIES EXISTS columns other than C U(...,C,...)
(SELECT C FROM T) SUBSETOF (SELECT C FROM U)
It is true that this implies that SELECT C FROM T JOIN U ON T.C = U.C = SELECT C FROM U, ie a join on a FK returns the same number of rows. But so what? The join's meaning is still the same function of its arguments'.
Whether a particular join on a particular column set involves a foreign key is just not germane to understanding the meaning of a query.

One to Many Database Table Relationship

I do have two tables from the database..
Consultation Table
-ConsultationNo -PK
-PatientNo -FK
-Diagnosis
-Etc...
VitalSign Table
-VitalSignNo -PK
-Weight
-Height
-HeartRate
-BloodPressure
-Etc
I need to join these two tables like this..
Consultation Table
-ConsultationNo -PK
-PatientNo -FK
**-VitalSignNo** -FK
-Diagnosis
-Etc...
But sometimes, my VitalSign Table will not accept any values, therefore the relationship between those two tables would not be enforced, what should I do?
Use an outer join like so...
Select * from Consultation
Left join VitalSign on (Consultation.ConsultationNo = vitalsign.ConsultationNo)
You will get all rows from consultation, and the matching rows from Vitalsign. When there are no rows in Vitalsign all those columns will come back null, but you will still get the consultation rows.
This might not be exactly right for your situation because the table structure in your question doesn't look complete. That, is you didn't have a foreign key in either table referencing the PK of the other.

Database tables: One-to-many of different types

Due to non-disclosure at my work, I have created an analogy of the situation. Please try to focus on the problem and not "Why don't you rename this table, m,erge those tables etc". Because the actual problem is much more complex.
Heres the deal,
Lets say I have a "Employee Pay Rise" record that has to be approved.
There is a table with single "Users".
There are tables that group Users together, forexample, "Managers", "Executives", "Payroll", "Finance". These groupings are different types with different properties.
When creating a "PayRise" record, the user who is creating the record also selects both a number of these groups (managers, executives etc) and/or single users who can 'approve' the pay rise.
What is the best way to relate a single "EmployeePayRise" record to 0 or more user records, and 0 or more of each of the groupings.
I would assume that the users are linked to the groups? If so in this case I would just link the employeePayRise record to one user that it applies to and the user that can approve. So basically you'd have two columns representing this. The EmployeePayRise.employeeId and EmployeePayRise.approvalById columns. If you need to get to groups, you'd join the EmployeePayRise.employeeId = Employee.id records. Keep it simple without over-complicating your design.
My first thought was to create a table that relates individual approvers to pay rise rows.
create table pay_rise_approvers (
pay_rise_id integer not null references some_other_pay_rise_table (pay_rise_id),
pay_rise_approver_id integer not null references users (user_id),
primary key (pay_rise_id, pay_rise_approver_id)
);
You can't have good foreign keys that reference managers sometimes, and reference payroll some other times. Users seems the logical target for the foreign key.
If the person creating the pay rise rows (not shown) chooses managers, then the user interface is responsible for inserting one row per manager into this table. That part's easy.
A person that appears in more than one group might be a problem. I can imagine a vice-president appearing in both "Executive" and "Finance" groups. I don't think that's particularly hard to handle, but it does require some forethought. Suppose the person who entered the data changed her mind, and decided to remove all the executives from the table. Should an executive who's also in finance be removed?
Another problem is that there's a pretty good chance that not every user should be allowed to approve a pay rise. I'd give some thought to that before implementing any solution.
I know it looks ugly but I think somethimes the solution can be to have the table_name in the table and a union query
create table approve_pay_rise (
rise_proposal varchar2(10) -- foreign key to payrise table
, approver varchar2(10) -- key of record in table named in other_table
, other_table varchar2(15) );
insert into approve_pay_rise values ('prop000001', 'e0009999', 'USERS');
insert into approve_pay_rise values ('prop000001', 'm0002200', 'MANAGERS');
Then either in code a case statement, repeated statements for each other_table value (select ... where other_table = '' .. select ... where other_table = '') or a union select.
I have to admit I shudder when I encounter it and I'll now go wash my hands after typing a recomendation to do it, but it works.
Sounds like you'd might need two tables ("ApprovalUsers" and "ApprovalGroups"). The SELECT statement(s) would be a UNION of UserIds from the "ApprovalUsers" and the UserIDs from any other groups of users that are the "ApprovalGroups" related to the PayRiseId.
SELECT UserID
INTO #TempApprovers
FROM ApprovalUsers
WHERE PayRiseId = 12345
IF EXISTS (SELECT GroupName FROM ApprovalGroups WHERE GroupName = "Executives" and PayRiseId = 12345)
BEGIN
SELECT UserID
INTO #TempApprovers
FROM Executives
END
....
EDIT: this would/could duplicate UserIds, so you would probably want to GROUP BY UserID (i.e. SELECT UserID FROM #TempApprovers GROUP BY UserID)

Resources