merging two duplicate contacts/ColdFusion - database

Having to do with data integrity - I maintain a coldfusion database at a small shop that keeps addresses of different contacts. These contacts sometimes contain notes in them.
When you are merging two duplicate contacts, one may be created in 2002 and one in 2008. If the contact in 2002 has notes prior to 2008, my question would be does it matter if you merge these contacts and keep the 2008 contact's ID number? Would that affect the data integrity or create any sort of issues with the notes earlier than 2008?
I hope I've accurately described my scenario, as I am not familiar with the proper technical terms.
I really appreciate the help sir!

I will say that the fact that the app is ColdFusion is pretty well irrelevant to your problem.
It seems like some of what you're asking depends on your business requirements. Do you need to retain older notes?
As other folks are saying, it depends in large part on your table structure. If, as I suspect, you've got just one table that has a NOTES column in it, you'll need to figure out how to concatenate the values in multiple rows that all refer to the same person.

It sounds like you have two tables - contacts and notes. The notes table has a foreign key back to the contacts table to record which contact a note belongs to.
So, imagine two Contacts - Bill (primary key 1, created in 2002) and William (primary key 2, created in 2008).
Imagine one Note with a foreign key 1 (ie this note belongs to Bill).
If you merge Bill and William, and only keep the William record, then you would need to update the foreign key from 1 (Bill - deleted) to 2 (William) on the note or it will not display on William's record.
(If you add further details about your table structure we can probably help more.)

Related

How to model a 'history' table from a join table

I have a need to track some history for a table that contains ids from other tables:
I want to track the status of the company_device table such that I can make entries to know when the status of the relationship changed (when a device was assigned to a company, and when it was unassigned, etc). The company_device table would only contain current, existing relationships. So I'd like to do 'something' like this:
But this won't work, because it requires there to be a record in company_device for the FK to be satisfied in the company_device_history table. For example, if I
insert into company_device values (1,1);
insert into company_device_history values (1,1,'Assigned',now());
Then I can't ever remove the record from company_device because of the foreign key constraint. So I've currently settled on this:
so I'm not restricted by the foreign key.
My question is : is there a better way to model this? I could add the status and effective_date to the company_device table and query based on status or effective_date, but that doesn't seem to be a good solution to me. I'd like to know how others might approach this.
When looking exclusively at the problem (that is, when modeling the nature of the business problem at hand), the only thing you need is one single table COMPANY_DEVICE_ASSIGNMENT with four columns C_ID, D_ID, FROM and TO, telling you that "device D_ID was assigned to company C_ID from FROM to TO".
Systems do exist that allow you to work on this basis, however none of them speak SQL (an excellent book with an in-depth treatment of the subject matter of temporal data, I'd even call it the "canonical" work, is "Time and Relational Theory - Googling for it can't miss). If you do this in an SQL system, this "straightforward" approach is not going to get you very far. In that case, your practical options are limited by :
what temporal features are offered by the DBMS you want/can/must use
what features are supported by whatever modeling tool you want/can/must use to generate DDL.
As Neil's comment stated : the most recent version of the SQL standard had "temporal support" as its main novelty, and they are absolutely relevant to your problem, but the products actually offering these features are relatively few and far between.

Elegant normalization without adding fields, extra table. Best relationship

I have 2 tables I am trying to normalize. The problem is I don't want to create an offhand table with new fields, though a link table perhaps works. What is the most elegant way to convey that the "Nintendo" entry is BOTH a publisher and a developer? I don't want "Nintendo" to be duplicated. I am thinking a many-to-many relationship can be key here.
I want to stress that I absolutely want the developer and a publisher tables to remain. I don't mind creating a link between the 2 with a new relationship.
Here are the 2 tables I am trying to normalize:
Below is a solution I tried (I don't like it):
There is nothing wrong with your two tables.
In fact all you need is
developer(name) -- company [name] is a developer
publisher(name) -- company [name] is a publisher
Your changes have nothing to do with normalization. Normalization never creates new column names. 'I don't want "Nintendo" to be duplicated' is misconceived. There is nothing wrong per se with values appearing in multiple places. See the answers by sqlvogel & myself here.
BUT: Depending on what it means for a row to be in one of your tables there might be a better design to reduce errors because the two tables' values could be "constrained" ie depend on each other. That has something to do with "redundancy" but it is about constraints and does not involve normalization. And for us to address it you have to tell us exactly when a row goes into each table based on the world situation.
If you don't want to repeat the strings for implementation(-dependent) reasons (space taken or speed of operations at the expense of more joins) then add a table of name ids and strings (actually company ids and names) and replace your old name columns and values by company id columns and values. But that's not normalization, that's complicating your schema for the sake of implementation-dependent data optimization tradeoffs. (And you should demonstrate this is needed and works.)
The currently accepted answer (tables Game_Company, Company_Role & Game_Company_Role) just adds a lot of redundant data. Just like your question adds three redundant tables. The original two tables already say what companies are developers and which are publishers. The other tables are just views/queries on the two!
If you want a new table for "[id] identifies a company named [name] with ..." then this is a case of developers and publisher as subtypes of supertype company. Search on database subtypes. See this answer. Then you would use company id instead of name to identify companies. You could also then further simplify (!) by using company id as the only column in tables developer and publisher and also everywhere else instead of developer_id and publisher_id.
"Redundancy" is not about values appearing in multiple places. It is about multiple rows stating the same thing about the application. When using a design like that there are two basic problems: to say certain things multiple rows are involved (while the normalized version involves just one row); and there is no way to say just one of the things at a time (which normalization can help with). If you make two different independent statements about Nintendo then you need two tables and Nintendo mentioned in each one. Re rows making statements about the application see this. (And search my other answers re a table's "statement" or criterion".) Normalization helps because it replaces tables whose rows state things of the form "... AND ..." by other tables that state the "..." separately. See this and this. (Normalization is commonly erroneously thought to involve or include avoiding multiple similar columns, avoiding columns whose values have repetitive structure and/or replacing strings by ids, but although these can be good design ideas they're not normalization.)
In comments, chat and another answer you gave this starting point:
Here's the simplest design. (I'll assume game titles are not unique so you need game_ids.)
-- game [game_id] with title [title] released on [release_date] is rated [rating]
game(game_id,title,release_date,rating)
game_developer(game_id,name) -- game [game_id] is developed by company [name]
game_publisher(game_id,name) -- game [game_id] is published by company [name]
game_platform(game_id,name) -- game [game_id] is on platform [name]
Only if you want a separate list of companies so that a company can exist without developing or publishing and/or can have its own data do you need to add:
company(name,...) -- [name] identifies a company
Only if you want role-specific data for developers and publishers do you need to add:
developer(name,...) -- developer [name] has ...
publisher(name,...) -- publisher [name] has ...
The relevant foreign keys of the various options are straightward.
None of your versions need _ids. Your versions 2 & 3 won't work because they don't say what companies develop a game or what companies publish a game. You don't need roles but if you have them (Verison 2) then you need a table "game [game_id] has company [name] as [role]". Otherwise (Verision 3) you need tables for "[game_id] is developed by company [name]" and "game [game_id] is published by company [name]". Wherever you differ from my designs ask yourself why you have additional structure and why you can do without it and (possibly) why you would explicitly want it anyway.
I think you want something like this:
Game_Company
ID Name
1 Retro Studios
2 HAL Laboratories
3 Nintendo
...
Company_Role
ID Name
1 Developer
2 Publisher
...
Game_Company_Role
CompanyID RoleID
1 1
2 1
3 1
3 2
...
To get a list of all companies that have role 'Developer':
SELECT gc.name
FROM Game_Company gc JOIN Game_Company_Role gcr ON gcr.CompanyID=gc.ID
WHERE gcr.RoleID = 1
This is a bit generic approach to the problem, it may be of interest. As #Dour High Arch has pointed out in his solution, the Developer and Publisher are just roles for a 'party'. Each part has 0,1 or more roles with a given product and roles may overlap.This is good and bad. For example, a product may be developed by 5 developers but published by at most 1 publisher.
I have chosen to introduce a serial_id as system generated PK, but this is not mandatory. You could use the 3FKs as a PK and not user the serial_id.
Notice that having a party as a generalization of different entity types is not always good since 1 or more columns will have to be set to not mandatory if it is not common to all parties, however, this is very common in real applications.
Convention:
name_PK = Primary Key,
name_FK = Foreign Key
Here are three final solutions as proposed by the comments. You can see the table being broken down from the top "un-normalized" table.
The rules are as follows:
1 game can have 1 or many developers and 1 developer can have 1 or many games.
1 game can have 1 or many publishers and 1 publisher can have 1 or many games.
1 game can have 1 or many platforms and 1 platform can have 1 or many games.
Version 1
I left the 2 "Nintendo" entries in red. According to research and implementation, this is not technically redundant data. See my comments under philipxy's answer. This looks simple and elegant. 4 tables with a many-to-many relationship.
Here is the relationship diagram (4 tables and 3 link tables):
Verison 2
Version 1 "repeats" "Nintendo" but Version 2 has a "Company" table instead. Compare the 2 different versions. What is the right way?
Version 3
Here is the subtyping philipxy was talking about. How is this version?

Linking an address table to multiple other tables

I have been asked to add a new address book table to our database (SQL Server 2012).
To simplify the related part of the database, there are three tables each linked to each other in a one to many fashion: Company (has many) Products (has many) Projects and the idea is that one or many addresses will be able to exist at any one of these levels. The thinking is that in the front-end system, a user will be able to view and select specific addresses for the project they specify and more generic addresses relating to its parent product and company.
The issue now if how best to model this in the database.
I have thought of two possible ideas so far so wonder if anyone has had a similar type of relationship to model themselves and how they implemented it?
Idea one:
The new address table will additionally contain three fields: companyID, productID and projectID. These fields will be related to the relevant tables and be nullable to represent company and product level addresses. e.g. companyID 2, productID 1, projectID NULL is a product level address.
My issue with this is that I am storing the relationship information in the table so if a project is ever changed to be related to a different product, the data in this table will be incorrect. I could potentially NULL all but the level I am interested in but this will make getting parent addresses a little harder to get
Idea two:
On the address table have a typeID and a genericID. genericID could contain the IDs from the Company, Product and Project tables with the typeID determining which table it came from. I am a little stuck how to set up the necessary constraints to do this though and wonder if this is going to get tricky to deal with in the future
Many thanks,
I will suggest using Idea one and preventing Idea two.
Second Idea is called Polymorphic Association anti pattern
Objective: Reference Multiple Parents
Resulting side effect: Using dual-purpose foreign key will violating first normal form (atomic issue), loosing referential integrity
Solution: Simplify the Relationship
The simplification of the relationship could be obtained in two ways:
Having multiple null-able forging keys (idea number 1): That will be
simple and applicable if the tables(product, project,...) that using
the relation are limited. (think about when they grow up to more)
Another more generic solution will be using inheritance. Defining a
new entity as the base table for (product, project,...) to satisfy
Addressable. May naming it organization-unit be more rational. Primary key of this organization_unit table will be the primary key of (product, project,...). Other collections like Address, Image, Contract ... tables will have a relation to this base table.
It sounds like you could use Junction tables http://en.wikipedia.org/wiki/Junction_table.
They will give you the flexibility you need to maintain your foreign key restraints, as well as share addresses between levels or entities if that is desired.
One for Company_Address, Product_Address, and Project_Address

Database Design: Explain this schema

Full disclosure...Trying feverishly here to learn more about databases so I am putting in the time and also tried to get this answer from the source to no avail.
Barry Williams from databaseanswers has this schema posted.
Clients and Fees Schema
I am trying to understand the split of address tables in this schema. Its clear to me that the Addresses table contains the details of a given address. The Client_Addresses and Staff_Addresses tables are what gets me.
1) I understand the use of Primary Foreign Keys as shown but I was under the assumption that when these are used you don't have a resident Primary Key in that same table (date_address_from in this case). Can someone explain the reasoning for both and put it into words how this actually works out?
2) Why would you use date_address_from as the primary key instead of something like client_address_id as the PK? What if someone enters two addresses in one day would there be conflicts in his design? If so or if not, what?
3) Along the lines of normalization...Since both date_address_from and date_address_to are the same in the Client_Addresses and Staff_Addresses table should those fields just not be included in the main Address table?
Evaluation
First an Audit, then the specific answers.
This is not a Data Model. This is not a Database. It is a bucket of fish, with each fish drawn as a rectangle, and where the fins of one fish are caught in the the gills of another, there is a line. There are masses of duplication, as well as masses of missing elements. It is completely unworthy of using as an example to learn anything about database design from.
There is no Normalisation at all; the files are very incomplete (see Mike's answer, there are a hundred more problem like that). The other_details and eg.s crack me up. Each element needs to be identified and stored: StreetNo, ApartmentNo, StreetName, StreetType, etc. not line_1_number_street, which is a group.
Customer and Staff should be normalised into a Person table, with all the elements identified.
And yes, if Customer can be either a Person or an Organisation, then a supertype-subtype structure is required to support that correctly.
So what this really is, the technically accurate terms, is a bunch of flat files, with descriptions for groups of fields. Light years distant from a database or a relational one. Not ready for evaluation or inspection, let alone building something with. In a Relational Data Model, that would be approximately 35 normalised tables, with no duplicated columns.
Barry has (wait for it) over 500 "schemas" on the web. The moment you try to use a second "schema", you will find that (a) they are completely different in terms of use and purpose (b) there is no commonality between them (c) let's say there was a customer file in both; they would be different forms of customer files.
He needs to Normalise the entire single "schema" first,
then present the single normlaised data model in 500 sections or subject areas.
I have written to him about it. No response.
It is important to note also, that he has used some unrecognisable diagramming convention. The problem with these nice interesting pictures is that they convey some things but they do not convey the important things about a database or a design. It is no surprise that a learner is confused; it is not clear to experienced database professionals. There is a reason why there is a standard for modelling Relational databases, and for the notation in Data Models: they convey all the details and subtleties of the design.
There is a lot that Barry has not read about yet: naming conventions; relations; cardinality; etc, too many to list.
The web is full of rubbish, anyone can "publish". There are millions of good- and bad-looking "designs" out there, that are not worth looking at. Or worse, if you look, you will learn completely incorrect methods of "design". In terms of learning about databases and database design, you are best advised to find someone qualified, with demonstrated capability, and learn from them.
Answer
He is using composite keys without spelling it out. The PK for client_addresses is client_id, address_id, date_address_from). That is not a bad key, evidently he expects to record addresses forever.
The notion of keeping addresses in a separate file is a good one, but he has not provided any of the fields required to store normalised addresses, so the "schema" will end up with complete duplication of addresses; in which case, he could remove addresses, and put the lines back in the client and staff files, along with their other_details, and remove three files that serve absolutely no purpose other than occupying disk space.
You are thinking about Associative Tables, which resolve the many-to-many relations in Databases. Yes, there, the columns are only the PKs of the two parent tables. These are not Associative Tables or files; they contain data fields.
It is not the PK, it is the third element of the PK.
The notion of a person being registered at more than one address in a single day is not reasonable; just count the one address they slept the most at.
Others have answered that.
Do not expect to identify any evidence of databases or design or Normalisation in this diagram.
1) In each of those tables the primary key is a compound key consisting of three attributes: (staff_id, address_id, date_address_from) and (client_id, address_id, date_address_from). This presumably means that the mapping of clients/staff to addresses is expected to change over time and that the history of those changes is preserved.
2) There's no obvious reason to create a new "id" attribute in those tables. The compound key does the job adequately. Why would you want to create the same address twice for the same client on the same date? If you did then that might be a reason to modify the design but that seems like an unlikely requirement.
3) No. The apparent purpose is that they are the applicable dates for the mapping of address to client/staff - not dates applicable to the address alone.
3) Along the lines of
normalization...Since both
date_address_from and date_address_to
are the same in the Client_Addresses
and Staff_Addresses table should those
fields just not be included in the
main Address table?
No. But you did find a problem.
The designer has decided that clients and staff are two utterly different things. By "utterly different", I mean they have no attributes in common.
That's not true, is it? Both clients and staff have addresses. I'm sure most of them have telephones, too.
Imagine that someone on staff is also a client. How many places is that person's name stored? That person's address? Can you hear Mr. Rogers in the background saying, "Can you spell 'update anomaly'? . . . I knew you could."
The problem is that the designer was thinking of clients and staff as different kinds of people. They're not. "Client" describes a business relationship between a service provider (usually, that is, not a retailer) and a customer, which might be either a person or a company. "Staff" describes a employment relationship between a company and a person. Not different kinds of people--different kinds of relationships.
Can you see how to fix that?
This 2 extra tables enables you to have address history per one person.
You can have them both in one table, but since staff and client are separated, it is better to separate them as well (b/c client id =1 and staff id =1 can't be used on the same table of address).
there is no "single" solution to a design problem, you can use 1 person table and then add a column to different between staff and client. BUT The major Idea is that the DB should be clear, readable and efficient, and not to save tables.
about 2 - the pk is combined, both clientID, AddressID and from.
so if someone lives 6 month in the states, then 6 month in Israel, and then back to the states, to the same address - you need only 2 address in address table, and 3 in the client_address.
The idea of heaving the from_Date as part of the key is right, although it doesn't guaranty data integrity - as you also need manually to check that there isn't overlapping dates between records of the same person.
about 3 - no (look at 2).
Viewing the data model, i think:
1) PF means that the field is both part of the primary key of the table and foreign key with other table.
2) In the same way, the primary key of Staff_Addresses is {staff_id,address_id,date_adderess_from} not just date_adderess_from
3) The same that 2)
In reference to Staff_Addresses table, the Primary Key on date_address_from basically prevents a record with the same staff_id/address_id entered more than once. Now, i'm no DBA, but i like my PKs to be integers or guids for performance reasons/faster indexing. If i were to do this i would make a new column, say, Staff_Address_Id and make it the PK column and put a unique constraint on staff_id/address_id/date_address_from.
As for your last concern, Addresses table is really a generic address storage structure. It shouldn't care about date ranges during which someone resided there. It's better to be left to specific implementations of an address such as Client/Staff addresses.
Hope this helps a little.

Table "Inheritance" in SQL Server

I am currently in the process of looking at a restructure our contact management database and I wanted to hear peoples opinions on solving the problem of a number of contact types having shared attributes.
Basically we have 6 contact types which include Person, Company and Position # Company.
In the current structure all of these have an address however in the address table you must store their type in order to join to the contact.
This consistent requirement to join on contact type gets frustrating after a while.
Today I stumbled across a post discussing "Table Inheritance" (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server).
Basically you have a parent table and a number of sub tables (in this case each contact type). From there you enforce integrity so that a sub table must have a master equivalent where it's type is defined.
The way I see it, by this method I would no longer need to store the type in tables like address, as the id is unique across all types.
I just wanted to know if anybody had any feelings on this method, whether it is a good way to go, or perhaps alternatives?
I'm using SQL Server 05 & 08 should that make any difference.
Thanks
Ed
I designed a database just like the link you provided suggests. The case was to store the data for many different technical reports. The number of report types is undefined and will probably grow to about 40 different types.
I created one master report table, that has an autoincrement primary key. That table contains all common information like customer, testsite, equipmentid, date etc.
Then I have one table for each report type that contains the spesific information relating to that report type. That table have the same primary key as the master and references the master as well.
My idea for splitting this into different tables with a 1:1 relation (which normally would be a no-no) was to avoid getting one single table with a huge number of columns, that gets very difficult to maintain as your constantly adding columns.
My design with table inheritance gave me segmented data and expandability without beeing difficult to maintain. The only thing I had to do was to write special a special save method to handle writing to two tables automatically. So far I'm very happy with the design and haven't really found any drawbacks, except for a little more complicated save method.
Google on "gen-spec relational modeling". You'll find a lot of articles discussing exactly this pattern. Some of them focus on table design, while others focus on an object oriented approach.
Table inheritance pops up in a few of them.
I know this won't help much now, but initially it may have been better to have an Entity table rather than 6 different contact types. Then each Entity could have as many addresses as necessary and there would be no need for type in the join.
You'll still have the problem that if you want the sub-type fields and you have only the master contact, you'll have to know what table to go looking at - or else join to all of them. But otherwise this is a workable solution to a common problem.
Another possibility (fairly similar in structure, but different in how you think of it) is to simply put all your contacts into one table. Then for the more specific fields (birthday say for people and department for position#company) create separate tables that are associated with that contact.
Contact Table
--------------
Name
Phone Number
Address Table
-------------
Street / state, etc
ContactId
ContactBirthday Table
--------------
Birthday
ContactId
Departments Table
-----------------
Department
ContactId
It requires a different way of thinking of things though - instead of thinking of people vs. companies, you think of the various functional requirements for the task at hand - if you want to send out birthday cards, get all the contacts that have birthdays associated with them, etc..
I'm going to go out on a limb here and suggest you should rethink your normalization strategy (as you seem to be lucky enough to be able to rethink your schema quite fundamentally). If you typically store an address for each contact, then your contact table should have the address fields in it. Alternatively if the address is stored per company then the address should be stored in the company table and your contacts linked to that company.
If your contacts only have one address, or one (or even 3, just not 'many') instance of the other fields, think about rationalizing them into a single table. In my experience having a few null fields is a far better alternative than needing left joins to data you aren't sure exists.
Fortunately for anyone who vehemently disagrees with me you did ask for opinions! :) IMHO you should only normalize when you really need to. Where you are rethinking schemas, denormalization should be considered at every opportunity.
When you have a 7th type, you'll have to create another table.
I'm going to try this approach. Yes, you have to create new tables when you have a new type, but since this table will probably have different columns, you'll end up doing this anyway if you don't use this scheme.
If the tables that inherit the master don't differentiate much from one another, I'd recommend you try another approach.
May I suggest that we just add a Type table. Ie a person has an address, name etc then the student, teacher as each use case presents its self we have a PersonType table that has an entry from the person table to n types and the subsequent new tables teacher, alien, singer as the system eveolves...

Resources