Have I resolved my database relationships correctly? - database

I'm sorry if this is the wrong place for this question. I volunteer for a charity group that has to store sensitive data, as we are a new type of format, there are no systems that fit within our needs or our budget. Someone else started building the database, I wasn't sure he was resolving the relationships correctly, so I presented him with an alternate ER model and now we haven't heard back from him, so I am left to build it by myself.
As we have to store sensitive data, I'm reluctant to put my database design on here in it's entirety, so if there is a way I can privately discuss this with someone, that would be my preference, as I would love to get someone else to check it in full to make sure it's ALL good... but for now, can someone confirm if I have resolved the relationships correctly, or if the original design was better?
The database description is: There are different types of members -
Client, Staff, Professional (Offsite), Supplier, Family, General. There are different types of Staff members: Managers, Volunteer, Professional (Onsite), Admin, Committee, Lecturer. A member can be one or many types eg: Client/Volunteer/Family, Supplier/Volunteer, Manager/Lecturer/Volunteer/Committee/Family.
The original guy resolved this by creating a separate table for each user, each table storing a name and address eg:
Client - ClientName, ClientAddress
Professional - ProfessionalName, ProfessionalAddress
Employee - EmployeeName, EmployeeAddress
Family - FamilyName, FamilyAddress
My only problem with this is that I would ideally like one person to have one MemberID with their name and address, but with the original design each person would have a different ID for each type of person that they were, all storing name, address, phone number, email etc.
I thought that creating a Member table and having a Member Type table with a joining Member Type List table would be a better design. This is how I have resolved the issue:
Member Tables
Have I done this correctly or should I continue with the original design?
Thanks
Update:
Staff Model

It makes sense to store all member related data within one table.
Also for programming, I cannot imagine any use case that would support having different tables for each member type.
That being said, I advise you to look up the concept of "user roles", since this seems very similar.
You have different users (members) and they can have different roles (member type). Based on your roles you might want to show different data / allow different actions / send specific mails (or whatever else you can imagine).
So generally your approach looks good. The only thing I think about is that right now you don't have stored who is a "Staff" member for example. If you just have one list with different names you don't store the structure.
Depending on your use cases you can e.g. make another column in MemberType table "isStaff". Or, if you need to be more flexible and there are likely more different member types in the future, you can make another table (e.g.) MemberTypeParent and set a foreign key on your MemberType table to that table to make the connection.
It all depends on what you want to do with the data in the future.

Related

Type tables and software hardcoded values

I have this DB model:
(TEXT is actually VARCHAR)
entity_group_type is not modifiable at runtime, but it will be modified in a near future to add more entries, several times, by the development team.
Now I need to retrieve all entries from entity that are of a given entity_group_type. How should the software handle this kind of queries? Should I hardcode entity_group_type _id/name in the software? If so, why do I even need this table then? And what's better, hardcode _id or name?
Or is this the wrong way to structure my data?
Thanks in advance!
Taking your questions one at a time:
How should you refer to the entity group in the software? Hard-code the id, or the name?
Refer to the entity groups in a way that makes your code the most readable. So, perhaps you use the name, or perhaps a constant that looks like the name but whose value is the id. Using a constant can avoid one join when you are looking up entities by group type, but usually this is not much of a concern.
Why do I even need that table in the DB then? Is this the wrong way to structure my data?
This is a perfectly acceptable way to structure your data. The most correct way depends on what you are doing with the data, but for most applications your structure would be correct. However, you certainly don't need that table in the database--you could instead have a "group_type" field on the entity_group table. Here are the pros and cons:
Advantages of the current structure:
Easy to add fields that describe the entity_group_type. For example, you might want to make some group types only viewable by admin users, or disabled, or whatever. If this is a possiblity for the future, it pretty much requires this database structure.
Ability to have your database software enforce referential integrity, meaning the data is kept consistent between the entity_group and entity_group_type tables.
Advantages of adding a group_type field on the entity_group table:
Possibly a simpler representation in your code. For exameple, if you use a MVC architecture, having the extra table might require another model object in your code. That's usually not a problem, and may have advantages, but sometimes simpler is better.
When you are looking up entities by entity group type, your SQL statements will be slightly simpler, since there will be one less table/join involved.
I think in most cases your current structure comes out ahead, although it does depend of how you use the data. Unless you had a good reason to structure the data differently, I would stick with your current structure.
I think you should definitely store entity_group_type in the database, in its own table, as per your design diagram above. Storing that information in the DB makes it possible to query against that information, which adds flexibility to your design.
Once you have this information in the DB, your question seems to be whether it should be broken out into its own table, or just be stored as column in the entity_type table. I think you should break it out into its own table, with a foreign key constraint from entity_type to entity_group_type.
Having entity_group_type in its own table allows for group types to be preserved even if all of the entity groups of that type are removed from the DB.
You can leverage the foreign key to ensure that the entity_group_type name is spelled consistently. Inserted entity_groups must satisfy the foreign key constraint, and so would have to either reference an existing entity_group_type having a properly spelled name, or insert an entity_group_type which will set the proper spelling for that new entity_group_type.
Use a synthetic key for your entity_group_type table, to reduce painful redundancy in the appearance of entity_group_type name appearance. This makes the data model more DRY, and allows updating the entity_group_type's name to be a simple update to one table.
As for storing the entity_group_type in your application logic, I would suggest storing the name, and looking up the id for the entity_group_type when it is needed. I think that would make the application codebase sowewhat more readable, and I think I've made a compelling argument for why I think this relevant information should have a representation in the database.
Given, that you know the name of your entity_group_type at runtime, then the correct way to find all entity_groups with that entity_group_type.name, you would use a join:
SELECT entity_group._id, entity_group.name, entity_group_type.name AS groupTypeName
FROM entity_group
JOIN entity_group_type ON entity_group.id_type = entity_group_type._id
WHERE entity_group_type.name = 'someGroupName';
This would result in all entity_group information for the given entity_group_type name.
And yes, this is a good or at least possible way to store that kind of information. Just imagine that eventually the entity_group_type gets a new attribute, e.g. disabled. Then it is still easily possible to find all entity_groups which are (not) in disabled types.

Is it a good idea to create a db with a generic table entity that can be decorated with a role and metadatas?

I've been thinking about creating a database that, instead of having a table per object I want to represent, would have a series of generic tables that would allow me to represent anything I want and even modifying (that's actually my main interest) the data associated with any kind of object I represent.
As an example, let's say I'm creating a web application that would let people make appointments with hairdressers. What I would usually do is having the following tables in my database :
clients
hairdressers: FK: id of the company the hairdresser works for
companies
appointments: FK: id of the client and the hairdresser for that appointment
But what happens if we deal with scientific hairdressers that want to associate more data to an appointment (e.g. quantity of shampoo used, grams of hair cut, number of scissor's strokes,...) ?
I was thinking instead of that, I could use the following tables:
entity: represents anything I want. PK(entity_id)
group: is an entity (when I create a group, I first create an entity which
id is then referred to by the FK of the group). PK(group_id), FK(entity_id)
entity_group: each group can contain multiple entity (thus also other groups): PK(entity_id, group_id).
role: e.g. Administrator, Client, HairDresser, Company. PK(role_id)
entity_role: each entity can have multiple roles: PK(entity_id, role_id)
metadata: contains the name and type of the metadata aswell as the associated role and a flag that describes if its mandatory or not. PK(metadata_id), FK(metadata_type_id, role_id)
metadata_type: contains information about available metadata types. PK(metadata_type_id)
metadata_value: PK(metadata_value_id), FK(metadata_id)
metadata_: different tables for the different types e.g. char, text, integer, double, datetime, date. PK(metadata__id), FK(metadata_value_id) which contain the actual value of a metadata associated with an entity.
entity_metadata: contains data associated with an entity e.g. name of a client, address of a company,... PK(entity_id, metadata_value_id). Using the type of the metadata, its possible to select the actual value of a metadata for this entity in the corresponding table.
This would allow me to have a completely flexible data structure but has a few drawbacks:
Selecting the metadatas associated with an entity returns multiple rows that I have to process in my code to create the representation of the entity in my code.
Selecting metadatas of multiple entities requires to loop over the same process as above.
Selecting metadatas will also require me to do a select for each one of the metadata_* table that I have.
On the other hand, it has some advantages. For example, instead of having a client table with a lot of fields that will almost never be filled, I just use the exact number of rows that I need.
Is this a good idea at all?
I hope that I've expressed clearly what I'm trying to achieve. I guess that I'm not the first one who wonders how to achieve that but I was not able to find the right keywords to find an answer to that question :/
Thanks!

Simple Database - Design issues

Just a homework question I am trying to figure out, I would appreciate some assistance.
Apparently, there are three problems with the design of this database design:
Account = {AccNumber, Type, Balance}
Customer = {CustID, FirstName, LastName, Address, AccNumber}
The one that is pretty obvious is that 'CustID' is useless if 'AccNumber' exists.
I am not quite sure about the second and third problem.
Is there a problem with a separate attribute for 'FirstName' and "LastName', cant we just use 'Name'?
And another option, if 'AccNumber' is the primary key (assuming CustID will be removed), it probably should be place in the beginning :
Such as:
Customer = {AccNumber, Name, Address}
Any input would be appreciated!
Thanks
The customer-account relationship, at first glance, appears to be a many-many relationship, which necessitates the use of an intermediary relationship table. For instance, I have three accounts of my own at my bank. In addition, my wife has two of her own. Finally, we have a shared account. The schema above could not well handle such relationships.
You could, indeed, just use "Name" - but you may need to know what the first or last names are at some point in the future and such a concatination can be quite problematic to split.
Good luck with your homework...
The problem is that you haven't presented us with what the database should represent in words; as it is now, there's nothing "wrong" with the design, since we don't know what the design is supposed to model.
I certainly wouldn't say that CustID is useless, as it serves as the primary key of the table. What you need to determine is the relationship between customers and accounts. It should be one of the following:
A single customer can be tied to multiple accounts, but a single account can be tied to a single customer
A single customer can be tied to only one account, but an account can be tied to multiple customers.
A single customer can be tied to multiple accounts, and a single account can be tied to multiple customers
Right now, with AccNumber in the Customer table, your design models #2.
How is is designed right now, each customer could only have one bank account.
The many-to-many relationship will be a problem. Instead, you might create a third table that holds the relationships. For example:
Account = {AccNumber, Type, Balance}
Connection = {ConnID, AccNumber, CustID}
Customer = {CustID, FirstName, LastName, Address}
This way, both Account and Customer are parented by Connection (for lack of a better name). You could query all connections with a certain AccNumber and find all the customers using that account, and vice versa.

Database Design: Explain this schema

Full disclosure...Trying feverishly here to learn more about databases so I am putting in the time and also tried to get this answer from the source to no avail.
Barry Williams from databaseanswers has this schema posted.
Clients and Fees Schema
I am trying to understand the split of address tables in this schema. Its clear to me that the Addresses table contains the details of a given address. The Client_Addresses and Staff_Addresses tables are what gets me.
1) I understand the use of Primary Foreign Keys as shown but I was under the assumption that when these are used you don't have a resident Primary Key in that same table (date_address_from in this case). Can someone explain the reasoning for both and put it into words how this actually works out?
2) Why would you use date_address_from as the primary key instead of something like client_address_id as the PK? What if someone enters two addresses in one day would there be conflicts in his design? If so or if not, what?
3) Along the lines of normalization...Since both date_address_from and date_address_to are the same in the Client_Addresses and Staff_Addresses table should those fields just not be included in the main Address table?
Evaluation
First an Audit, then the specific answers.
This is not a Data Model. This is not a Database. It is a bucket of fish, with each fish drawn as a rectangle, and where the fins of one fish are caught in the the gills of another, there is a line. There are masses of duplication, as well as masses of missing elements. It is completely unworthy of using as an example to learn anything about database design from.
There is no Normalisation at all; the files are very incomplete (see Mike's answer, there are a hundred more problem like that). The other_details and eg.s crack me up. Each element needs to be identified and stored: StreetNo, ApartmentNo, StreetName, StreetType, etc. not line_1_number_street, which is a group.
Customer and Staff should be normalised into a Person table, with all the elements identified.
And yes, if Customer can be either a Person or an Organisation, then a supertype-subtype structure is required to support that correctly.
So what this really is, the technically accurate terms, is a bunch of flat files, with descriptions for groups of fields. Light years distant from a database or a relational one. Not ready for evaluation or inspection, let alone building something with. In a Relational Data Model, that would be approximately 35 normalised tables, with no duplicated columns.
Barry has (wait for it) over 500 "schemas" on the web. The moment you try to use a second "schema", you will find that (a) they are completely different in terms of use and purpose (b) there is no commonality between them (c) let's say there was a customer file in both; they would be different forms of customer files.
He needs to Normalise the entire single "schema" first,
then present the single normlaised data model in 500 sections or subject areas.
I have written to him about it. No response.
It is important to note also, that he has used some unrecognisable diagramming convention. The problem with these nice interesting pictures is that they convey some things but they do not convey the important things about a database or a design. It is no surprise that a learner is confused; it is not clear to experienced database professionals. There is a reason why there is a standard for modelling Relational databases, and for the notation in Data Models: they convey all the details and subtleties of the design.
There is a lot that Barry has not read about yet: naming conventions; relations; cardinality; etc, too many to list.
The web is full of rubbish, anyone can "publish". There are millions of good- and bad-looking "designs" out there, that are not worth looking at. Or worse, if you look, you will learn completely incorrect methods of "design". In terms of learning about databases and database design, you are best advised to find someone qualified, with demonstrated capability, and learn from them.
Answer
He is using composite keys without spelling it out. The PK for client_addresses is client_id, address_id, date_address_from). That is not a bad key, evidently he expects to record addresses forever.
The notion of keeping addresses in a separate file is a good one, but he has not provided any of the fields required to store normalised addresses, so the "schema" will end up with complete duplication of addresses; in which case, he could remove addresses, and put the lines back in the client and staff files, along with their other_details, and remove three files that serve absolutely no purpose other than occupying disk space.
You are thinking about Associative Tables, which resolve the many-to-many relations in Databases. Yes, there, the columns are only the PKs of the two parent tables. These are not Associative Tables or files; they contain data fields.
It is not the PK, it is the third element of the PK.
The notion of a person being registered at more than one address in a single day is not reasonable; just count the one address they slept the most at.
Others have answered that.
Do not expect to identify any evidence of databases or design or Normalisation in this diagram.
1) In each of those tables the primary key is a compound key consisting of three attributes: (staff_id, address_id, date_address_from) and (client_id, address_id, date_address_from). This presumably means that the mapping of clients/staff to addresses is expected to change over time and that the history of those changes is preserved.
2) There's no obvious reason to create a new "id" attribute in those tables. The compound key does the job adequately. Why would you want to create the same address twice for the same client on the same date? If you did then that might be a reason to modify the design but that seems like an unlikely requirement.
3) No. The apparent purpose is that they are the applicable dates for the mapping of address to client/staff - not dates applicable to the address alone.
3) Along the lines of
normalization...Since both
date_address_from and date_address_to
are the same in the Client_Addresses
and Staff_Addresses table should those
fields just not be included in the
main Address table?
No. But you did find a problem.
The designer has decided that clients and staff are two utterly different things. By "utterly different", I mean they have no attributes in common.
That's not true, is it? Both clients and staff have addresses. I'm sure most of them have telephones, too.
Imagine that someone on staff is also a client. How many places is that person's name stored? That person's address? Can you hear Mr. Rogers in the background saying, "Can you spell 'update anomaly'? . . . I knew you could."
The problem is that the designer was thinking of clients and staff as different kinds of people. They're not. "Client" describes a business relationship between a service provider (usually, that is, not a retailer) and a customer, which might be either a person or a company. "Staff" describes a employment relationship between a company and a person. Not different kinds of people--different kinds of relationships.
Can you see how to fix that?
This 2 extra tables enables you to have address history per one person.
You can have them both in one table, but since staff and client are separated, it is better to separate them as well (b/c client id =1 and staff id =1 can't be used on the same table of address).
there is no "single" solution to a design problem, you can use 1 person table and then add a column to different between staff and client. BUT The major Idea is that the DB should be clear, readable and efficient, and not to save tables.
about 2 - the pk is combined, both clientID, AddressID and from.
so if someone lives 6 month in the states, then 6 month in Israel, and then back to the states, to the same address - you need only 2 address in address table, and 3 in the client_address.
The idea of heaving the from_Date as part of the key is right, although it doesn't guaranty data integrity - as you also need manually to check that there isn't overlapping dates between records of the same person.
about 3 - no (look at 2).
Viewing the data model, i think:
1) PF means that the field is both part of the primary key of the table and foreign key with other table.
2) In the same way, the primary key of Staff_Addresses is {staff_id,address_id,date_adderess_from} not just date_adderess_from
3) The same that 2)
In reference to Staff_Addresses table, the Primary Key on date_address_from basically prevents a record with the same staff_id/address_id entered more than once. Now, i'm no DBA, but i like my PKs to be integers or guids for performance reasons/faster indexing. If i were to do this i would make a new column, say, Staff_Address_Id and make it the PK column and put a unique constraint on staff_id/address_id/date_address_from.
As for your last concern, Addresses table is really a generic address storage structure. It shouldn't care about date ranges during which someone resided there. It's better to be left to specific implementations of an address such as Client/Staff addresses.
Hope this helps a little.

Table "Inheritance" in SQL Server

I am currently in the process of looking at a restructure our contact management database and I wanted to hear peoples opinions on solving the problem of a number of contact types having shared attributes.
Basically we have 6 contact types which include Person, Company and Position # Company.
In the current structure all of these have an address however in the address table you must store their type in order to join to the contact.
This consistent requirement to join on contact type gets frustrating after a while.
Today I stumbled across a post discussing "Table Inheritance" (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server).
Basically you have a parent table and a number of sub tables (in this case each contact type). From there you enforce integrity so that a sub table must have a master equivalent where it's type is defined.
The way I see it, by this method I would no longer need to store the type in tables like address, as the id is unique across all types.
I just wanted to know if anybody had any feelings on this method, whether it is a good way to go, or perhaps alternatives?
I'm using SQL Server 05 & 08 should that make any difference.
Thanks
Ed
I designed a database just like the link you provided suggests. The case was to store the data for many different technical reports. The number of report types is undefined and will probably grow to about 40 different types.
I created one master report table, that has an autoincrement primary key. That table contains all common information like customer, testsite, equipmentid, date etc.
Then I have one table for each report type that contains the spesific information relating to that report type. That table have the same primary key as the master and references the master as well.
My idea for splitting this into different tables with a 1:1 relation (which normally would be a no-no) was to avoid getting one single table with a huge number of columns, that gets very difficult to maintain as your constantly adding columns.
My design with table inheritance gave me segmented data and expandability without beeing difficult to maintain. The only thing I had to do was to write special a special save method to handle writing to two tables automatically. So far I'm very happy with the design and haven't really found any drawbacks, except for a little more complicated save method.
Google on "gen-spec relational modeling". You'll find a lot of articles discussing exactly this pattern. Some of them focus on table design, while others focus on an object oriented approach.
Table inheritance pops up in a few of them.
I know this won't help much now, but initially it may have been better to have an Entity table rather than 6 different contact types. Then each Entity could have as many addresses as necessary and there would be no need for type in the join.
You'll still have the problem that if you want the sub-type fields and you have only the master contact, you'll have to know what table to go looking at - or else join to all of them. But otherwise this is a workable solution to a common problem.
Another possibility (fairly similar in structure, but different in how you think of it) is to simply put all your contacts into one table. Then for the more specific fields (birthday say for people and department for position#company) create separate tables that are associated with that contact.
Contact Table
--------------
Name
Phone Number
Address Table
-------------
Street / state, etc
ContactId
ContactBirthday Table
--------------
Birthday
ContactId
Departments Table
-----------------
Department
ContactId
It requires a different way of thinking of things though - instead of thinking of people vs. companies, you think of the various functional requirements for the task at hand - if you want to send out birthday cards, get all the contacts that have birthdays associated with them, etc..
I'm going to go out on a limb here and suggest you should rethink your normalization strategy (as you seem to be lucky enough to be able to rethink your schema quite fundamentally). If you typically store an address for each contact, then your contact table should have the address fields in it. Alternatively if the address is stored per company then the address should be stored in the company table and your contacts linked to that company.
If your contacts only have one address, or one (or even 3, just not 'many') instance of the other fields, think about rationalizing them into a single table. In my experience having a few null fields is a far better alternative than needing left joins to data you aren't sure exists.
Fortunately for anyone who vehemently disagrees with me you did ask for opinions! :) IMHO you should only normalize when you really need to. Where you are rethinking schemas, denormalization should be considered at every opportunity.
When you have a 7th type, you'll have to create another table.
I'm going to try this approach. Yes, you have to create new tables when you have a new type, but since this table will probably have different columns, you'll end up doing this anyway if you don't use this scheme.
If the tables that inherit the master don't differentiate much from one another, I'd recommend you try another approach.
May I suggest that we just add a Type table. Ie a person has an address, name etc then the student, teacher as each use case presents its self we have a PersonType table that has an entry from the person table to n types and the subsequent new tables teacher, alien, singer as the system eveolves...

Resources