Share an "identity" across tables - sql-server

I am working on a database that has an existing Individuals table that every user type derives from. In other words, there are tables like:
Individual: FirstName, LastName, Email, <lots more>
Employee: IndividualId
Customer: IndividualId
etc.
Now, I want to add new type of user (WeirdPerson) that does not derive from Individual. (WeirdPerson has significantly less data associated with it than any Individual, and I really don't want to set practically every field in Individual to null for a WeirdPerson.)
I need a key field to use on a table that will have entries from WeirdPersons and entries from Individuals. This suggests map tables like so:
MashedupIndividuals: MashedupId, IndividualId
MashedupWeirdPerson: MashedupId, WeirdPersonId
I want MashedupId to be an auto-generated field. Since I'm using TSQL, an identity seems a good choice. Except that MashedupId is split across two tables. I considered yet another table:
MashedupIds: MashedupId
Set MashedupId to be an identity, and then make it a foreign key in MashedupIndividuals and MashedupWeirdPerson.
Is this the best way to proceed forward? How would you solve this?
EDIT: To clarify, the only piece of information I have for a WeirdPerson is an email address. I considered pulling the email field out of Individual, and then making a new GlobalPerson table with only GlobalPersonId and Email. The GlobalPerson table (or whatever better name I use) doesn't feel as natural as separating WeirdPerson as an entirely different type. However... I am willing to reconsider this position.

I would suggest a table to host data common to all people in your application. Then you could have additional tables for specific types of people and link them back to your common table.
tblPerson
PersonID (pk)
name, address, birthday, etc.
tblEmployee
EmployeeID (pk)
PersonID (fk to tblPerson)
Title, OfficePhone, Email, etc.
tblCustomer
CustomerID (pk)
PersonID (fk to tblPerson)
Other fields...
EDIT:
Here are some definitions more applicable to your question (and also more fun with these weird people). The key is establishing the data that weird people and normal people share and then establishing the tables/relationships to support that model. It might be necessary to move fields that are not applicable to weird people from tblIndividual to tblNormalPerson.
tblIndividual
IndividualID (pk)
Other fields for data applicable to both weird/normal people
tblWeirdPerson
WeirdPersonID (pk)
IndividualID (fk to tblIndividual)
NumberOfHeads (applicable to weird people)
tblNormalPerson
NormalPersonID (pk)
IndividualID (fk to tblIndividual)
FirstName (other fields applicable to normal people)
LastName
Etc...

You can use a uniqueidentifier field for your id. This is guaranteed to be unique across multiple tables. Use the NEWID() function to generate new values.

You could have a table with three fields, one of which is always null:
MashedupId, IndividualId,WeirdPersonId
or with an ID field and ID type (individual/weird)

Related

Modelling additional fields

I am trying to pull information about people from ten local data sources for a law enforecement organisation. I have created a table called Person:
CREATE TABLE Person
(ID int identity,
DateOfBirth datetime,
Occupation varchar(100),
LastVisit datetime,
datecreated datetime,
datemodified datetime,
primary key (id));
Each of the ten databases holds: DateOfBirth, Occupation, LastVisit, datecreated and datemodified so it is simple to create this table.
Some of the databases contain other information. For example, database 1 contains addresses and database 2 contains vehicles and database 3 contains property and database 4 contains intelligence etc.
I am trying to think of the best way to model these requirements. I believe there are two options:
Create tables for the additional information e.g. Vehicles table, addresses table, property table etc. There would be a zero to many relationship between Person and each of the additional tables.
Use a more dynamic approach i.e. CustomTable1, CustomTable2, CustomTable3 etc. CustomTable1 would have CustomField1, CustomField2 etc. This approach would mean introducing a layer of abstraction above the additional tables. Is there a design pattern for this that I am not aware of?
(whispering) Are you a Java programmer?
If you build a table to store data about vehicles, and you name it "CustomTable17", everybody that writes queries will curse you until your dying day. You will even curse yourself.
Don't do that. In your case, you know every attribute you need to model before you even start. You don't need "more dynamic". You don't need a "layer of abstraction".
Store data about vehicles in a table named "vehicles", unless there's a compelling reason to use a different name. "A more dynamic approach" and "a layer of abstraction" aren't compelling reasons to use a different name.
"This table isn't for just any vehicle. It's only for impounded vehicles." Now that would be a compelling reason to use a different name. But we're talking about a name like "impounded_vehicles", not a name like "CustomTable135".
When I've had to consolidate data from multiple sources, I have sometimes found it useful to store the source of each row. Give that some thought.

Using Multiple Databases

A company is hired by another company for helping in a certain field.
So I created the following tables:
Companies: id, company name, company address
Administrators: (in relation with companies) id, company_id, username, email, password, fullname
Then, each company has some workers in it, I store data about workers.
Hence, workers has a profession, Agreement Type signed and some other common things.
Now, the parent tables and data in it for workers (Agreement Types, Professions, Other Common Things) are going to be the same for each company.
Should I create 1 new database for each company? Or store All data into the same database?
Thanks.
Since "Agreement Types", "Professions" are going to be same for each company, I would suggest to have a lookup table like "AgreementTypes" with columns such as "ID", "Type" and refer "ID" column in "Workers" table. I don't think new database is required, relational databases are used to eliminate data redundancy and create appropriate relationships between entities.
By imagining having one database for one company, it ends up with having one record in "Company" table in each database. "Administrators" & "Workers" are associated with that single record. And other common entities such as "AgreementTypes" will be in other tables.
So, if there is any addition/modification to agreement type, it is difficult to do it in all databases. Similarly, if there is any new entity to be linked to "Company" entity, again all databases needs to be revisited based on assumption that these entities belong to ONE application.
You should have one single database, with a structure something like this (this is somewhat over-simplified, but you get the idea):
Companies
CompanyID PK
CompanyName
CompanyAddress
OtherCompanySpecificData
Workers
WorkerID PK
CompanyID FK
LastName
FirstName
DOB
AgreementTypeID FK
ProfessionID FK
UserID FK - A worker may need more than one user account
Other UserSpecificData
Professions
ProfessionID PK
Profession
OtherProfessionStuff
AgreementType
AgreementTypeID PK
AgreementTypeName
Description
OtherAgreementStuff
Users
UserID PK -- A Worker may need more than 1 user account
WorkerID FK
UserName
Password
AccountStatus
Groups
GroupID PK
GroupName
OtherGroupSpecificData
UserGroups --Composite Key with UserID and GroupID
UserID PK
GroupID PK
Obviously, things will grow a little more complex, and I don't know your requirements or business model. For example, if companies can have different departments, you may wish to create a CompanyDepartment table, and then be able to assign workers to various departments.
And so on.
The more atomic you can make your data structures, the more flexible your database will be as it grows. Google the term Database Normalization, and specifically the Third Normal Form (3NF) for a database (Considered the minimum for efficient database design).
Hope that helps. Feel free to elaborate if you are stuck - there is a lot of great help here on SO.

Polymorphic ORM database pattern

I remember when - a long time ago - I was messing around with the Java ActiveObjects ORM, I came across a database pattern it claimed to support.
However, it is very difficult to find the pattern's name, by search for the general idea, thus I would really appreciate it if someone could give me the name of this pattern, and some thoughts on the "cleanness" of using it.
The pattern was defined as such:
Table:
reference_type <enum>
reference <integer>
...
... where the value of the field reference_type would determine the type (and thus the table) to which was being referred. Thus:
User:
location_type <l&l, address, city, country>
location <integer>
...
... where depending on the value of the location_type field, the foreign key location would refer to either the l&l, address, city or country table.
You're having difficulty finding it because it's not a real (in the sense of widely adopted and encouraged) database design pattern.
Stay away from patterns like this. While ORM's make mapping database tables to types easier, tables are not types, and vice versa. While it's not clear what the model you've described is supposed to do, you should not have columns that serve as fake foreign keys to multiple tables (when I say "fake", I mean that you're storing a simple identifier value that corresponds to the primary key of another table, but you can't actually define the column as a foreign key).
Model your database to represent the data, model your objects to represent the process, and use your ORM and intermediate layers to do the translation; don't try to push the database into your code, and don't push your code into the database.
Edit in reponse to comment
You're mixing database and OO terminology; while I'm not familiar with the syntax you're using to define that function, I'm assuming it's an instance function on the User type called getLocation that takes no parameters and returns a Location object. Databases don't support the concepts of instance (or any type-based) functions; relational databases can have user-defined functions, but these are simple procedural functions that take parameters and return either values or result sets. They do not correspond to particular tables or field in any way, other than the fact that you can use them within the body of the function.
That being said, there are two questions to answer here: how to do what you've asked, and what might be a better solution.
For what you've asked, it sounds like you have a supertype-subtype relationship, which is a standard database design pattern. In this case, you have a single supertype table that represents the parent:
Location
---------------
LocationID (PK)
...other common attributes
(Note here that I'm using LocationID for the sake of simplicity; you should have more specific and logical attributes to define the primary key, if possible)
Then you have one or more tables that define subtypes:
Address
-----------
LocationID (PK, FK to Location)
...address-specific attributes
Country
-----------
LocationID (PK, FK to Location)
...country-specific attributes
If a specific instance of Location can only be one of the subtypes, then you should add a discriminator value to the parent table (Location) that indicates which of the subtypes it corresponds to. You can use CHECK constraints to ensure that only valid values are in this field for a given row.
In the end, though, it sounds like you might be better served with a hybrid approach. You're fundamentally representing two different types of locations, from what I can see:
Coordinate-based locations (L&L)
Municipal/Postal/Etc.-based locations (Country, City, Address), and each of these is simply a more specific version of the previous
Given this, a simple model would look like this:
Location
------------
LocationID (PK)
LocationType (non-nullable) ('C' for coordinate, 'P' for postal)
LocationCoordinate
------------------
LocationID (PK; FK to Location)
Latitude (non-nullable)
Longitude (non-nullable)
LocationPostal
------------------
LocationID (PK, FK to Location)
Country (non-nullable)
City (nullable)
Address (nullable)
Now the only problem that remains is that we have nullable columns. If you want to keep your queries simple but take (justified!) flak from people about leaving nullable columns, then you can leave it as-is. If you want to go to what most people would consider a better-designed database, you can move to 6NF for our two nullable columns. Doing this will also have the nice side-effect of giving us a little more control over how these fields are populated without having to do anything extra.
Our two nullable fields are City and Address. I am going to assume that having an Address without a City would be nonsense. In this case, we remove these two attributes from the LocationPostal table and create two more tables:
LocationPostalCity
------------------
LocationID (PK; FK to LocationPostal)
City (non-nullable)
LocationPostalCityAddress
-------------------------
LocationID (PK; FK to LocationPostalCity)
Address (non-nullable)
Seems to me that city and country would be part of the address table, and that L&L wouldn't be mutually exclusive with address (you might have both...), so, why limit yourself like that to one or the other?
Further more, this would prevent the location column from enforcing referential integrity, would it not, since it wouldn't always reference the same table?

If my entity has a (0-1):1 relation to another entity, how would I model that in the database?

For example, lets say I have an entity called user and an entity called profile_picture. A user may have none or one profile picture.
So I thought, I would just create a table called "user" with this fields:
user: user_id, profile_picture_id
(I left all other attributes like name, email, etc. away, to simplify this)
Ok, so if an user would have no profile_picture, it's id would be NULL in my relational model. Now someone told me that I have to avoid setting anything to NULL, because NULL is "bad".
What do you think about this? Do I have to take off that profile_picture_id from the user table and create a link-table like user__profile_picture with user_id, profile_picture_id?
Which would be considered to be "better practice" in database design?
This is a perfectly reasonable model. True, you can take the approach of creating a join table for a 1:1 relationship (or, somewhat better, you could put user_id in the profile_picture table), but unless you think that very few users will have profile pictures then that's likely a needless complication.
Readability is an important component in relational design. Do you consider the profile picture to be an attribute of the user, or the user to be an attribute of the profile picture? You start from what makes logical sense, then optimize away the intuitive design as you find it necessary through performance testing. Don't prematurely optimize.
NULL isn't "bad". It means "I don't know." It's not wrong for you or your schema to admit it.
"NULL is bad" is a rather poor excuse for a reason to do (or not do) something.
That said, you may want to model this as a dependent table, where the user_id is both the primary key and a foreign key to the existing table.
Something like this:
Users UserPicture Picture
---------------- -------------------- -------------------
| User_Id (PK) |__________| User_Id (PK, FK) |__________| Picture_Id (PK) |
| ... | | Picture_Id (FK) | | ... |
---------------- -------------------- -------------------
Or, if pictures are dependent objects (don't have a meaningful lifetime independent of users) merge the UserPicture and Picture tables, with User_Id as the PK and discard the Picture_Id.
Actually, looking at it again, this really doesn't gain you anything - you have to do a left join vs. having a null column, so the other scenario (put the User_Id in the Picture table) or just leave the Picture_Id right in the Users table both make just as much sense.
Your user table should not have a nullable field called profile_picture_id. It would be better to have a user_id column in the profile_picture table. It should of course be a foreign key to the user table.
Since when is a nullable foreign key relationship "bad?" Honestly introducing another table here seems kind of silly since there's no possibility to have more than one profile picture. Your current schema is more than acceptable. The "null is bad" argument doesn't hold any water in my book.
If you're looking for a slightly better schema, then you could do something like drop the "profile_picture_id" column from the users table, and then make a "user_id" column in the pictures table with a foreign key relationship back to users. Then you could even enforce a UNIQUE constraint on the user_id foreign key column so that you can't have more than one instance of a user_id in that table.
EDIT: It's also worth noting that this alternate schema could be a little bit more future-proof should you decide to allow users to have more than one profile picture in the future. You can simply drop the UNIQUE constraint on the foreign key and you're done.
It is true that having many columns with null values is not recommended. I would suggest you make the picture table a weak entity of user table and have an identifying relationship between the two. Picture table entries would depend on user id.
Make the profile picture a nullable field on the user table and be done with it. Sometimes people normalize just for normalization sake. Null is perfectly fine, and in DB2, NULL is a first class citizen of values with NULL being included in indices.
I agree that NULL is bad. It is not relational-database-style.
Null is avoided by introducing an extra table named UserPictureIds. It would have two columns, UserId and PictureId. If there's none, it simply would not have the respective line, while user is still there in Users table.
Edit due to peer pressure
This answer focuses not on why NULL is bad - but, on how to avoid using NULLs in your database design.
For evaluating (NULL==NULL)==(NULL!=NULL), please refer to comments and google.

T-SQL Database Relationships PK FK Same Name?

Scenario
I have 3 database tables. One is for Images and the other two are People & Places. Since each person can have many images and each place can have many images, I want to have a ONE TO MANY Relationship between both people and images, as well as places and images.
Question
Does the foreign key have to be called the same name as the primary key? Or is it possible for me to call the Foreign key in the images table something generic, for example "PKTableID". This way I only need one image table.
Help greatly appreciated.
Regards,
EDIT:
The reason for wanting to have only a single image table, is because each image only refers to a single other table. As well as this, I used the example here of two tables, the actually database I will be using will have 20 tables, so I wanted to know whether it was still possible to use a SINGLE IMAGE TABLE FOR 20 ONE-TO-MANY RELATIONSHIPS?
EDIT
If one image is only ever owned by one of the twenty tables, this design might work:
People (PersonId, Name)
Places (PlaceId, Name)
Dogs (DogId, Breed)
Doors (DoorId, Height, Width)
Images (ImageId, ImageBinary, OwnerId, OwnerTable)
Where OwnerTable is the name or the code for the table that OwnerId belongs to.
This would save you 20 FKs in the image table, or 20 association tables. Then, in the joins, you would specify OwnerTable, depending on the table you are joining to.
You would need to use convertable types for the Ids (eg, TINYINT, SMALLINT, and INT), and preferably one type for all (eg, INT), and you would have to manage referential integrity yourself though triggers or some other code.
/EDIT
You need 5 tables, not 3:
People (PersonId, Name)
Places (PlaceId, Name)
Images (ImageId, ImageBinary)
ImagesPeople (ImageId, PersonId)
ImagesPlaces (ImageId, PlaceId)
You can call the fields whatever you want. People.Id, ImagesPeople.PersonId, etc.
But what you can't do is something like this:
People (PersonId, Name)
Places (PlaceId, Name)
Images (ImageId, ImageBinary, PlaceOrPersonId)
Well, you can, but the database won't help you enforce the relationship, or tell you which table the FK belongs to. How would you know? There are hackish work-arounds like staggering the id increments, adding a type column to Images, or using GUIDs.
Alternatively:
Things (ThingId PK, Type)
People (ThingId PK/FK, Name, Age)
Places (ThingId PK/FK, Name, LatLon)
Images (ImageId PK, ImageBinary, ThingId FK)
You could also make Images a "Thing". Sometimes you see designs like this. It does give you referential integrity, but not type exclusivity. You could have the same ThingId in People, Places and Images, and the database wouldn't care. You would have to code that rule yourself.
Edit: at Cylon Cat's suggestion, scenario 4:
People (PersonId, Name)
Places (PlaceId, Name)
PeopleImages (PeopleImageId, ImageBinary)
PlaceImages (PlaceImageId, ImageBinary)
Here, images are exclusively owned by one person or place. Similar to version 2, but with declared foreign keys. It may have some performance benefits vs the 5 table design, since fewer joins are required. You lose "Image" as a distinct entity, replaced by "PeopleImage" and "PlaceImage".
As Peter pointed out you really need a many to many relationship, unless you want to restrict your images to pictures with only one person, also, a good place index will reflect the hierarchical nature of place names (Montmartre, Paris, France = three possible names for one place).
Now technically you can call your indexes and tables anything that is not a reserved word and has not been used already, X, Y , Z12345 and WOBBLY are all valid names for an index.
In practice however its best to follow a naming convention that points to whats being stored and what for. So tables PEOPLE_X_IMAGES and a PLACES_X_IMAGES would be a good idea in your case. You dont really need anything about people or places in the actual images table. Likewise you dont need anything about images in the PEOPLE and PLACES tables.
You could add theoretically add two foreign keys to the images table, one for People and one for Places. Then you could allow nulls and join on the appropriate columns when running a query. This is not much of a solution though because what does this table look like when you have 14 tables that need to join with it?
If you are not going to use GUIDs, then I say you set it up as many-to-many for the sake of the next guy that has to understand it.
You can use one table but you would need twenty different fields for the relationships. If you set up a foreign key relationship, then all the data must relate to the parent table, you can't store two foreign key relationships in one table. So you must set up a column for each fk you want to have.
How about
Person (PersonId PK, Name, ImageId FK)
Image (ImageId PK, Name)
Place (PlaceId PK, Name, ImageId FK)

Resources