I am trying to pull information about people from ten local data sources for a law enforecement organisation. I have created a table called Person:
CREATE TABLE Person
(ID int identity,
DateOfBirth datetime,
Occupation varchar(100),
LastVisit datetime,
datecreated datetime,
datemodified datetime,
primary key (id));
Each of the ten databases holds: DateOfBirth, Occupation, LastVisit, datecreated and datemodified so it is simple to create this table.
Some of the databases contain other information. For example, database 1 contains addresses and database 2 contains vehicles and database 3 contains property and database 4 contains intelligence etc.
I am trying to think of the best way to model these requirements. I believe there are two options:
Create tables for the additional information e.g. Vehicles table, addresses table, property table etc. There would be a zero to many relationship between Person and each of the additional tables.
Use a more dynamic approach i.e. CustomTable1, CustomTable2, CustomTable3 etc. CustomTable1 would have CustomField1, CustomField2 etc. This approach would mean introducing a layer of abstraction above the additional tables. Is there a design pattern for this that I am not aware of?
(whispering) Are you a Java programmer?
If you build a table to store data about vehicles, and you name it "CustomTable17", everybody that writes queries will curse you until your dying day. You will even curse yourself.
Don't do that. In your case, you know every attribute you need to model before you even start. You don't need "more dynamic". You don't need a "layer of abstraction".
Store data about vehicles in a table named "vehicles", unless there's a compelling reason to use a different name. "A more dynamic approach" and "a layer of abstraction" aren't compelling reasons to use a different name.
"This table isn't for just any vehicle. It's only for impounded vehicles." Now that would be a compelling reason to use a different name. But we're talking about a name like "impounded_vehicles", not a name like "CustomTable135".
When I've had to consolidate data from multiple sources, I have sometimes found it useful to store the source of each row. Give that some thought.
Related
I have been asked to add a new address book table to our database (SQL Server 2012).
To simplify the related part of the database, there are three tables each linked to each other in a one to many fashion: Company (has many) Products (has many) Projects and the idea is that one or many addresses will be able to exist at any one of these levels. The thinking is that in the front-end system, a user will be able to view and select specific addresses for the project they specify and more generic addresses relating to its parent product and company.
The issue now if how best to model this in the database.
I have thought of two possible ideas so far so wonder if anyone has had a similar type of relationship to model themselves and how they implemented it?
Idea one:
The new address table will additionally contain three fields: companyID, productID and projectID. These fields will be related to the relevant tables and be nullable to represent company and product level addresses. e.g. companyID 2, productID 1, projectID NULL is a product level address.
My issue with this is that I am storing the relationship information in the table so if a project is ever changed to be related to a different product, the data in this table will be incorrect. I could potentially NULL all but the level I am interested in but this will make getting parent addresses a little harder to get
Idea two:
On the address table have a typeID and a genericID. genericID could contain the IDs from the Company, Product and Project tables with the typeID determining which table it came from. I am a little stuck how to set up the necessary constraints to do this though and wonder if this is going to get tricky to deal with in the future
Many thanks,
I will suggest using Idea one and preventing Idea two.
Second Idea is called Polymorphic Association anti pattern
Objective: Reference Multiple Parents
Resulting side effect: Using dual-purpose foreign key will violating first normal form (atomic issue), loosing referential integrity
Solution: Simplify the Relationship
The simplification of the relationship could be obtained in two ways:
Having multiple null-able forging keys (idea number 1): That will be
simple and applicable if the tables(product, project,...) that using
the relation are limited. (think about when they grow up to more)
Another more generic solution will be using inheritance. Defining a
new entity as the base table for (product, project,...) to satisfy
Addressable. May naming it organization-unit be more rational. Primary key of this organization_unit table will be the primary key of (product, project,...). Other collections like Address, Image, Contract ... tables will have a relation to this base table.
It sounds like you could use Junction tables http://en.wikipedia.org/wiki/Junction_table.
They will give you the flexibility you need to maintain your foreign key restraints, as well as share addresses between levels or entities if that is desired.
One for Company_Address, Product_Address, and Project_Address
i have an event calendar application with a sql database behind it and right now i have 3 tables to represent the events:
Table 1: Holiday
Columns: ID, Date, Name, Location, CalendarID
Table 2: Vacation
Columns: Id, Date, Name, PersonId, WorkflowStatus
Table 3: Event
Columns: Id, Date, Name, CalendarID
So i have "generic events" which go into the event tableand special events like holidays and vacation that go into these separate tables. I am debating consolidating these into a single table and just having columns like location and personid blank for the generic events.
Table 1: Event:
Columns : Id, Date, Name, Location, PersonId, WorkflowStatus
does anyone see any strong positives or negative to each option. Obviously there will be records that have columns that dont necessarily apply but it there is overlap with these three tables.
Either way you construct it, the application will have to cope with variant types. In such a situation I recommend that you use a single representation in the DBM because the alternative is to require a multiplicity of queries.
So it becomes a question of where you stick the complexity and even in a huge organization, it's really hard to generate enough events to worry about DBMS optimization. Application code is more flexible than hardwired schemata. This is a matter of preference.
If it were my decision, i'd condense them into one table. I'd add a column called "EventType" and update that as you import the data into the new table to specify the type of event.
That way, you only need to index one table instead of three (if you feel indexes are required), the data is all in one table, and the queries to get the data out would be a little more concise because you wouldn't need to union all three tables together to see what one person has done. I don't see any downside to having it all in one table (although there will probably be one that someone will bring up that i haven't thought of).
How about sub-typing special events to an Event supertype? This way it is easy to later add any new special events.
Data integrity is the biggest downside of putting them in one table. Since these all appear to be fields that would be required, you lose the ability to require them all by default and would have to write a trigger to make sure that data integrity was maintained properly (Yes, this must be maintained in the database and not, as some people believe, by the application. Unless of course you want to have data integrity problems.)
Another issue is that these are the events you need now and there may be more and more specialized events in the future and possibly breaking code for one type of event because you added another specialized field that only applies to something else is a big risk. When you make a change to add some required vacation information, will you be sure to check that it doesn't break the application concerning holidays? Or worse not error out but show information you didn't want? Are you going to look at the actual screen everytime? Unit testing just of code may not pick up this type of thing especially if someone was foolish enough to use select * or fail to specify columns in an insert. And frankly not every organization actually has a really thorough automated test process in place (it could be less risk if you do).
I personally would tend to go with Damir Sudarevic's solution. An event table for all the common fields (making it easy to at least get a list of all events) and specialized tables for the fields not held in common, making is simpler to write code that affects only one event and allowing the database to maintain its integrity.
Keep them in 3 separate tables and do a UNION ALL in a view if you need to merge the data into one resultset for consumption. How you store the data on disk need not be identical to how you need to consume the data so long as the performance is adequate.
As you have it now there are no columns that do not apply for any of the presented entities. If you were to merge the 3 tables into one you'd have to add a field at the very least to know which columns to expect to be populated and reduce your performance. Now when you query for a holiday alone you go to a subset of the data that you would have to sift through / index to get at the same data in a merged storage table.
If you did not already have these tables defined you could consider creating one table with the following signature...
create table EventBase (
Id int PRIMARY KEY,
Date date,
Name varchar(50)
)
...and, say, the holiday table with the following signature.
create table holiday (
Id int PRIMARY KEY,
EventId int,
Location varchar(50),
CalendarId int
)
...and join the two when you needed to do so. Choosing between this and the 3 separate tables you already have depends on how you plan on using the tables and volume but I would definitely not throw all into a single table as is and make things less clear to someone looking at the table definition with no other initiation.
Or combine the common fields and separate out the unique ones:
Table 1: EventCommon
Columns: EventCommonID, Date, Name
Table 2: EventOrHoliday
Columns: EventCommonID, CalendarID, isHoliday
Table3: Vacation
Columns: EventCommonID, PersonId, WorkflowStatus
with 1->many relationships between EventCommon and the other 2.
I want to write an enterprise software and now I'm in the DB design phase. The software will have some master data such as Suppliers, Customers, Inventories, Bankers...
I considering 2 options:
Put each of these on one separate table. The advantage: the table will have all necessary information for that kind of master file (Customer: name, address,.../Inventory: Type, Manufacturer, Condition...). Disadvantage: Not flexible. When I want to have a new type of master data, such as Insurer, I have to design another table.
Put all in one table and this table have foreign key to another table which have type of each kind of master data (table 1: id, data_type, code, name, address....; table 2: data_type, data_type_name). Advantage: flexible - if I want more master data such as Insurer, I just put in table 2: code: 002, name: Insurer, and then put detail each insurer into table 1). Disadvantage: table 1 must have sufficient field to store all kind of information including: customer name, address, account, inventory's manufacturer, inventory's quality...).
So which method do you usually do (or you think work better).
Thank you very much
I would advise creating separate tables for each entity type - it will be a lot easier to maintain in the future when you discover things you want to add for one entity type that don't apply to the others. If all of the entities (Suppliers, Customers, etc) are going to have the same fields and the only difference is their type then you could theoretically use one table. However, I would expect that there would be enough differences between the entity types that it would be worth creating separate tables for each. If there are several fields in common (e.g. address information) you could create a table for the common elements and have a foreign key in the individual tables to the table with the common data (e.g. AddressID).
logically, each "master" entity should be in its own table
if you don't, you'll find joins will become very painful, and your generic lookup table will accumulate all kinds of useless fields
Is there a simple method to decide on what fields and indexes are needed for each table in an app you design?
For example, if it is a webapp that simply lets people create lists (any number of lists, and users can create "things to do" list or "shopping" list), and the user can assign other users to edit the list, and whether the list is viewable publicly or to only certain users, how can the tables be design so that it is very accurate and designed quickly? What about the indexes?
I did that in college and then revisited the question some time ago and have a method, but would like to find out if there are standard and good ways to do it out in the field.
Database design is hard ...
As with many things in life, it's a series of tradeoffs. The first thing you need to decide is what DBMS you will use, (MySQL, SQL Server, Oracle, PostgreSQL, one of the "Object-oriented" databases, etc.
Then you need to decide on normalization v. insane numbers of JOINs to get to your data. Questions like "how much logic will I implement in triggers, stored procedures, in app code, etc" need to be addressed.
There is no "Quick'n'Easy" way to design anything but the most trivial of databases.
'Course, that's just my experience. YMWV.
it is beyond the scope of this answer to fully explain database design
I generally break my design into three parts (part 1 and 2 happen up front, while 3 is usually near the project end)
1) create the tables based on relationships (parent/child/etc)
2) create fields based on content (parent has x atributes, etc)
3) create indexes last based on how you select data from your tables
Haven't heard of any formal approaches to this problem but there are rules of thumb. All nouns and business objects become tables, normalized of course. And I'd think the attributes sort of speak for themselves. I guess?
As for indexes, it just comes with working with the data. Any column that's joined off of deserves an index (maybe even clustered). It's very... depends. But there are patterns. But other than optimizing for joins, many indexes are directly related to how the data is used, and this isn't something that can be provided by rule of thumb. Like if you look up users by pk and elsewhere by last_name, last_name deserves an index.
I think the solution is a subjective one. When I have to design tables I look at the Java object that will represent that particular data model and go from there. You'll find a lot of frameworks (Django, CakePHP, RoR) have you develop the model and the frameworks will build the corresponding tables.
So I would suggest evaluating what functionality and data you need to store and develop your tables from that. Also look into whether the tool set you have at your disposal offers to generate the tables for you from the object structure.
I would go for the straightforward (almost) normalized design:
CREATE TABLE lists (
listid serial,
name varchar,
ownerid int references users(userid)
)
CREATE TABLE list_items (
listid int references lists(listid),
value varchar,
date datetime
)
CREATE TABLE permissions (
permissionid serial,
description varchar,
)
CREATE TABLE list_permissions (
listid int references lists(listid),
permissionid int references permissions(permissionid)
userid int references users(userid)
)
CREATE TABLE users (
userid serial,
name varchar
)
Which indexes to create would depend on what are the actual most used queries and how are they performing. For instance, if you query a lot on the lists and list_items (likely) you'd want an index on listid and on name, if you'll be searching by name.
Just some ideas. Hope they're helpful.
I'd try not to lock yourself in if you're still trying to see what works.
Just from your description, you'd want a table for your users' information, as well as:
tbl_lists:
ID_list (primary key)
UserID (foreign key to list owner)
ListName
tbl_listItems:
ID_listItem (primary key)
ListID (foreign key to list)
ItemDescription
tbl_permissions:
ID_permission (primary key)
ListID
UserID (foreign key to user you're granting permission to)
PermissionTypeID (what kind of permission)
tbl_permissionTypes:
ID_permissionType (primary key)
Description ("can view", "can edit", etc.)
The more flexible you can make things while you're designing, the better. You can optimize later.
If you want to keep things very simple and are not too concerned with normalizing. You could create one big table that stores the main object your webapp is based around, ex: lists, and have other smaller supporting tables link to the big table, ex: tbl_listType, tbl_permission, tbl_list_items).
Then when you write queries, you almost certainly include the main table and you can link in other supporting tables for more granular details.
I am currently in the process of looking at a restructure our contact management database and I wanted to hear peoples opinions on solving the problem of a number of contact types having shared attributes.
Basically we have 6 contact types which include Person, Company and Position # Company.
In the current structure all of these have an address however in the address table you must store their type in order to join to the contact.
This consistent requirement to join on contact type gets frustrating after a while.
Today I stumbled across a post discussing "Table Inheritance" (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server).
Basically you have a parent table and a number of sub tables (in this case each contact type). From there you enforce integrity so that a sub table must have a master equivalent where it's type is defined.
The way I see it, by this method I would no longer need to store the type in tables like address, as the id is unique across all types.
I just wanted to know if anybody had any feelings on this method, whether it is a good way to go, or perhaps alternatives?
I'm using SQL Server 05 & 08 should that make any difference.
Thanks
Ed
I designed a database just like the link you provided suggests. The case was to store the data for many different technical reports. The number of report types is undefined and will probably grow to about 40 different types.
I created one master report table, that has an autoincrement primary key. That table contains all common information like customer, testsite, equipmentid, date etc.
Then I have one table for each report type that contains the spesific information relating to that report type. That table have the same primary key as the master and references the master as well.
My idea for splitting this into different tables with a 1:1 relation (which normally would be a no-no) was to avoid getting one single table with a huge number of columns, that gets very difficult to maintain as your constantly adding columns.
My design with table inheritance gave me segmented data and expandability without beeing difficult to maintain. The only thing I had to do was to write special a special save method to handle writing to two tables automatically. So far I'm very happy with the design and haven't really found any drawbacks, except for a little more complicated save method.
Google on "gen-spec relational modeling". You'll find a lot of articles discussing exactly this pattern. Some of them focus on table design, while others focus on an object oriented approach.
Table inheritance pops up in a few of them.
I know this won't help much now, but initially it may have been better to have an Entity table rather than 6 different contact types. Then each Entity could have as many addresses as necessary and there would be no need for type in the join.
You'll still have the problem that if you want the sub-type fields and you have only the master contact, you'll have to know what table to go looking at - or else join to all of them. But otherwise this is a workable solution to a common problem.
Another possibility (fairly similar in structure, but different in how you think of it) is to simply put all your contacts into one table. Then for the more specific fields (birthday say for people and department for position#company) create separate tables that are associated with that contact.
Contact Table
--------------
Name
Phone Number
Address Table
-------------
Street / state, etc
ContactId
ContactBirthday Table
--------------
Birthday
ContactId
Departments Table
-----------------
Department
ContactId
It requires a different way of thinking of things though - instead of thinking of people vs. companies, you think of the various functional requirements for the task at hand - if you want to send out birthday cards, get all the contacts that have birthdays associated with them, etc..
I'm going to go out on a limb here and suggest you should rethink your normalization strategy (as you seem to be lucky enough to be able to rethink your schema quite fundamentally). If you typically store an address for each contact, then your contact table should have the address fields in it. Alternatively if the address is stored per company then the address should be stored in the company table and your contacts linked to that company.
If your contacts only have one address, or one (or even 3, just not 'many') instance of the other fields, think about rationalizing them into a single table. In my experience having a few null fields is a far better alternative than needing left joins to data you aren't sure exists.
Fortunately for anyone who vehemently disagrees with me you did ask for opinions! :) IMHO you should only normalize when you really need to. Where you are rethinking schemas, denormalization should be considered at every opportunity.
When you have a 7th type, you'll have to create another table.
I'm going to try this approach. Yes, you have to create new tables when you have a new type, but since this table will probably have different columns, you'll end up doing this anyway if you don't use this scheme.
If the tables that inherit the master don't differentiate much from one another, I'd recommend you try another approach.
May I suggest that we just add a Type table. Ie a person has an address, name etc then the student, teacher as each use case presents its self we have a PersonType table that has an entry from the person table to n types and the subsequent new tables teacher, alien, singer as the system eveolves...