I am modelling a loan database for a friend.
A Customer can have 0 to N Addresses (street address or POBox address or even more than 1 street addresses and more than on POBox addresses). A Property must have only one Address. A Company (employment info) must have only one Address.
It will be better to have a separate Addresses table for the Customers table. The address for Property and Company can go with Properties and Companies table.
But since we have an Addresses table here, do you think it is a good idea or not to share that Addresses table for Companies and Properties tables as well?
When we think about the relationship between entities, we should cut off a time point (static way?) or we should view a certain range of the time (dynamic way?) to analyze their relationship? For example, a company can only have ONE address at certain time point but that company may moved from one place to another recently. Then a company may have more than one address for a certain range of time.
Customer would be better with a 1 to N than a 0 to N relationship, since you are making loans you might want to know where their address.
A Company (employment info) must have only one address.
Then a company may have more than one address for a certain range of
time.
You are contradicting yourself a bit, why would you need the two address? I think the company will have their official just one address till they get everything on the new address at which point you can update your DB to the new one.
But since we have an Addresses table here, do you think it is a good
idea or not to share that Addresses table for Companies and Properties
tables as well?
Yes
And here a nice link with some ideas on modelling:
http://www.databaseanswers.org/data_models/
A Company (employment info) must have only one Address.
Not necessarily. A Company can have a mailing address and a physical address.
Since we have an Addresses table here, do you think it is a good idea or not to share that Addresses table for Companies and Properties tables as well?
Yes, it's a good idea to put addresses in the Addresses table. Your Properties table would have an address row foreign key, and your Companies table would have 2 foreign keys, one for a mailing address and one for a physical address. The mailing address would be an optional (nullable) foreign key.
You would need a CuustomerAddress table to maintain the 0 to N relationship between Customer and Address. If you want, you can also have a 0 to N relationship between Address and Customer.
The table would look like this.
CustomerAddress
---------------
CustomerAddress ID
Customer ID
Address ID
The CustomerAddress ID is the primary (clustering) index. It is an ascending integer or long, or some other unique ID.
You would have a unique indexon (Customer ID, Address ID).
If you want to associate addresses with customers, you would have another unique index on (Address ID, Customer ID).
A company can only have ONE address at certain time point but that company may moved from one place to another recently. Then a company may have more than one address for a certain range of time.
If this information is important, then you have to include a date written column in your CompanyAddress table. You would create a unique index on (Company ID, Date written descending). This way, the first row you retrieve from the Address table would be the most current address.
It seems like a very popular idea to put all Addresses in their own table. Developers love to seek out repetition and eliminate it. But in this case I would hesitate to dignify addresses with Entity status by putting them in their own dedicated table, because if, like most applications, you don't treat addresses as full-fledged entities, this gets overcomplicated.
If you treated addresses as real entities then if two companies somehow shared the same address, or one inhabited a location for a while, then another one inhabited that same location, then those companies would reference the same address. Because when your application was accepting an address as input it would go see if there was an existing address and reference it rather than just slam some garbage into the address table. Which one do you intend to do? I expect it's the slam one, which is fine, because like most business applications you totally don't care if the new address you're putting in is the same as some other address already in the database, you have no interest in tracking the addresses as individual things. And that's the difference between entities and cat food.
So with the consolidation we have to introduce an intersection table, and index it, and all our entities that have addresses have to join to it, we have to think about whether to get the address eagerly or use lazy loading. We chucked all the addresses into one bucket and have to work to make sure everybody can get to their own address quickly. For real entities this makes some sense because different things need to link to the same entity, but we established above that we don't care about that, nobody is sharing these entries.
Where's the repetition we're eliminating by consolidating addresses into one table? The addresses are going to end up in the database somewhere regardless, with the same fields, we're not saving space. The only repetition is in the DDL used to generate the schema, which we can manage by making a reusable component (where "component" is the Hibernate term) for the address (which addresses redundancy in the application code) and using the ORM tool to generate the schema. Or, worst case, just ignore it, addresses don't change that much, it's not your biggest problem.
These requirements you are describing sound suspiciously enterprise-y for a project you're doing for a friend. Possibly your friend's brain has been poisoned by overexposure to elaborate requirements concocted by committees who don't know what they're doing. It's bad enough we have to put up with this junk at work, but for personal projects? Try to talk him down.
But maybe your friend is outsourcing his enterprise-y work to you and you're stuck with 0-N addresses per customer. If so, contain the damage: make a table exclusively for customer addresses, so you don't need the intersection table, and put the other entities' addresses inline. Making these entities that have only one address go get their address from another table doesn't buy you anything but more joins. If you need history, write it to a separate history table where it's out of the way.
Related
Say for example, I have an ADDRESS table, that will store similar attributes of other entities like address, city, zip, country, etc. The entities are USER, COMPANY, BANK, BRANCH, etc. I would like to use this one table ADDRESS to store the addresses of the other entities rather than creating other tables for each entity to store the ADDRESS like so, USER_ADDRESS, COMPANY_ADDRESS, BANK_ADDRESS, BRANCH_ADDRESS.
Is this possible? Am i breaking any laws or conventions? What are the consequences, if any?
Each entity (USER, COMPANY, etc.) should contain a reference to an entry in the ADDRESS table.
There are a few issues:
If 2 users have the same address, they should reference the same address id.
You will need to normalise addresses so that you're not duplicating information (e.g. if you know the city, then you automatically know the zip and country).
Of course, you may not want a well-normalised database. Saving the entire address as a string will improve read performance by reducing the number of join operations.
A lot of things depend on the exact use of the database.
It is fine to use a single ADDRESS table for that purpose and have an ADDRESS_ID in each of the other entities. Depends on the use case and the way you prefer to implement it. I most probably wouldn't do it. I also wouldn't do the other solution you're suggesting (an address table per entity).
So, let's say you want to implement a function to search for all the addresses, where it doesn't matter what type of entity is connected to it. You will have to search the ADDRESS table. If you get results, then you have to search the other four tables to see which record is connected to that address.
You could add a field ENTITY_TYPE in the ADDRESS table where you specify which type of entity it is connected to, so you don't have to search the four tables, but I don't recommend this since you can have consistency errors (USER 17 points to ADDRESS 14, but ADDRESS 14 has ENTITY_TYPE = BANK).
Now, with your other solution (having four separate tables to store the addresses of the four different entities) you're just going to have to search those four tables and then search the corresponding entity table to get the entity you're looking for.
My solution in most cases is adding the address fields to the entities tables themselves. Having ADDRESS, ZIP_CODE and COUNTRY_CODE (always use proper country codes, not country names https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes) will make it simple. When you present a list of items (users, banks, companies, offices, whatever), it's really common to show the name and the address at the same time in a table. Having no JOINS makes it faster and easier to process. If you want to update an address, it's on the table itself. No lookups!
Of course, like most things in programming, it depends on what your needs are.
Also, please, don't try to split the ADDRESS in more fields. I've seen ADDRESS_TYPE (street, road, avenue, square, ...), STREET_NAME, STREET_NUMBER, BLOCK_NUMBER, BLOCK_FLOOR, BLOCK_LETTER. I'm pretty sure you're never going to need something like SELECT * FROM USER WHERE STREET_NUMBER = 74.
The question might not be very clear in the title, let me explain:
In my model, I have an Person, that has an Address. However, many Persons can share the same Address.
As I was defining my model, I assumed that Person is an Entity, but Address a Value-Object since if you change a single property of the Address, well it's not the same Address anymore.
Since multiple Persons can share an Address, if I jump right into the database implementation, and naively assume that person has some address_xxxx fields, wouldn't it generate too many duplicates in the database ? Isn't it better that person has an address_id field, related to an address table ? If so, then Address is an Entity right ?
Is a Value Object that is used many times an entity?
No, but it depends...
It is often the case that a value object is actually a proxy identifier for an entity, that you may not have explicitly realized in your model.
For example:
1600 Pennsylvania Ave NW
Washington, DC
20500
If you look at that carefully, you'll see embedded in it
The name of a street
The name of a city
If those are references to a street/city entities in your model, then "address" is the representation of the current state of some entity (ex: "The White House").
Complicating things further - you want suitable abstractions for your model.
Consider money:
{USD:100}
That's a value type, we can replace any USD:100 with a "different" USD:100
{USD:100, SerialNumber:KB46279860I}
That's still a value (it's state), but it the state of a specific bill that exists in circulation (somewhere). What we have here is an information resource that is describing an entity out in the real world, somewhere.
You also need to be careful about coincident properties. For example; the name of the street changes -- should the value of address change? If the model cares about the current identifier of a location, then perhaps it should. If the model is tracking what information you put on an envelope two months ago, then it certainly shouldn't. (In other words, when we changed the label for the street entity, the label already printed on the envelope entity didn't change).
It's an important question, but the answer changes depending on what you are modeling at the time.
In my model, I have an Person, that has an Address. However, many
Persons can share the same Address.
Isn't it better that person has an address_id field, related to an
address table ? If so, then Address is an Entity right?
You have to recognize that there are two distinct models, a domain model and a persistence model and both may not agree on whether a concept is an entity or a value.
The first thing you have to do is ask yourself what is an address from the domain perspective? Is your domain interested in the lifecycle of addresses or they are just immutable values? For instance, what happens if there is a typo in an address? Do you simply discard the incorrect one and replace it or would you rather modify the original address details to track it's continuity? These questions will help you to determine whether an address is an entity or a value from the domain perspective.
Now, a concept may be a value in the domain while being an entity in the persistence model. For instance, let's say that you aren't interested in the lifecycle of addresses in the domain, but you are very concerned about optimizing the storage space. In that case, you could give identifiers to unique addresses in the DB and use that for relationships rather than copying the same address details multiple times.
However, doing so would introduce additional tensions between your models, so you must be sure that there are real benefits to do so.
I am creating an address dimension for a Snowflake Schema Data Warehouse. I have 75M locations on a source that I want to convert to said schema. I know how to handle Zip->City->County->State dimensions, but if I add street addresses to the location dimension I would have an equal number dimension rows as fact rows.
What I need to know, is where should the street addresses go (123 anywhere St.)? Should it go in the fact table? How do I handle street addresses?
Thanks.
The street address itself should go in a Fact. If it's a Real Estate app I'd imagine there'd be some kind of "Sale Contract Fact" or "Rental Contract Fact" or something similar - the street address would be an attribute of that fact.
In your instance the instance of the address is definitely tied to a single transaction. As you said, the same street address could appear multiple times, but it would be on different Sales Contracts and thus different Fact instances.
Other elements of the address (zipcode, city, state etc) would be dimensionalised as it makes sense to group them for classification.
Currently my users table has the below fields
Username
Password
Name
Surname
City
Address
Country
Region
TelNo
MobNo
Email
MembershipExpiry
NoOfMembers
DOB
Gender
Blocked
UserAttempts
BlockTime
Disabled
I'm not sure if I should put the address fields in another table. I have heard that I will be breaking 3NF if I don't although I can't understand why. Can someone please explain?
There are several points that are definitely not 3NF; and some questionable ones in addition:
Could there could be multiple addresses per user?
Is an address optional or mandatory?
Does the information in City, Country, Region duplicate that in Address?
Could a user have multiple TelNos?
Is a TelNo optional or mandatory?
Could a user have multiple MobNos?
Is a MobNo optional or mandatory?
Could a user have multiple Emails?
Is an Email optional or mandatory?
Is NoOfMembers calculated from the count of users?
Can there be more than one UserAttempts?
Can there be more than one BlockTime per user?
If the answer to any of these questions is yes, then it indicates a problem with 3NF in that area. The reason for 3NF is to remove duplication of data; to ensure that updates, insertions and deletions leave the data in consistent form; and to minimise the storage of data - in particular there is no need to store data as "not yet known/unknown/null".
In addition to the questions asked here, there is also the question of what constitutes the primary key for your table - I would guess it is something to do with user, but name and the other information you give is unlikely to be unique, so will not suffice as a PK. (If you think name plus surname is unique are you suggesting that you will never have more than one John Smith?)
EDIT:
In the light of further information that some fields are optional, I would suggest that you separate out the optional fields into different tables, and establish 1-1 links between the new tables and the user table. This link would be established by creating a foreign key in the new table referring to the primary key of the user table. As you say none of the fields can have multiple values then they are unlikely to give you problems at present. If however any of these change, then not splitting them out will give you problems in upgrading the application and the data to support the application. You still need to address the primary key issue.
As long as every user has one address and every address belongs to one user, they should go in the same table (a 1-to-1 relationship). However, if users aren't required to enter addresses (an optional relationship) a separate table would be appropriate. Also, in the odd case that many users share the same address (e.g. they're convicts in the same prison), you have a 1-to-many relationship, in which case a separate table would be the way to go. EDIT: And yes, as someone pointed out in the comments, if users have multiple address (a 1-to-many the other way around), there should also be separate tables.
Just as point that I think might help someone in this question, I once had a situation where I put addresses right in the user/site/company/etc tables because I thought, why would I ever need more than one address for them? Then after we completed everything it was brought to my attention by a different department that we needed the possibility of recording both a shipping address and a billing address.
The moral of the story is, this is a frequent requirement, so if you think you ever might want to record shipping and billing addresses, or can think of any other type of address you might want to record for a user, go ahead and put it in a separate table.
In today's age, I think phone numbers are a no brainer as well to be stored in a separate table. Everyone has mobile numbers, home numbers, work numbers, fax numbers, etc., and even if you only plan on asking for one, people will still put two in the field and separate them by a semi-colon (trust me). Just something else to consider in your database design.
the point is that if you imagine to have two addresses for the same user in the future, you should split now and have an address table with a FK pointing back to the users table.
P.S. Your table is missing an identity to be used as PK, something like Id or UserId or DataId, call it the way you want...
By adding them to separate table, you will have a easier time expanding your application if you decide to later. I generally have a simple user table with user_id or id, user_name, first_name, last_name, password, created_at & updated_at. I then have a profile table with the other info.
Its really all preference though.
You should never group two different types of data in a single table, period. The reason is if your application is intended to be used in production, sooner or later different use-cases will come which will need you to higher normalised table structure.
My recommendation - Adhere to SOLID principles even in DB design.
Full disclosure...Trying feverishly here to learn more about databases so I am putting in the time and also tried to get this answer from the source to no avail.
Barry Williams from databaseanswers has this schema posted.
Clients and Fees Schema
I am trying to understand the split of address tables in this schema. Its clear to me that the Addresses table contains the details of a given address. The Client_Addresses and Staff_Addresses tables are what gets me.
1) I understand the use of Primary Foreign Keys as shown but I was under the assumption that when these are used you don't have a resident Primary Key in that same table (date_address_from in this case). Can someone explain the reasoning for both and put it into words how this actually works out?
2) Why would you use date_address_from as the primary key instead of something like client_address_id as the PK? What if someone enters two addresses in one day would there be conflicts in his design? If so or if not, what?
3) Along the lines of normalization...Since both date_address_from and date_address_to are the same in the Client_Addresses and Staff_Addresses table should those fields just not be included in the main Address table?
Evaluation
First an Audit, then the specific answers.
This is not a Data Model. This is not a Database. It is a bucket of fish, with each fish drawn as a rectangle, and where the fins of one fish are caught in the the gills of another, there is a line. There are masses of duplication, as well as masses of missing elements. It is completely unworthy of using as an example to learn anything about database design from.
There is no Normalisation at all; the files are very incomplete (see Mike's answer, there are a hundred more problem like that). The other_details and eg.s crack me up. Each element needs to be identified and stored: StreetNo, ApartmentNo, StreetName, StreetType, etc. not line_1_number_street, which is a group.
Customer and Staff should be normalised into a Person table, with all the elements identified.
And yes, if Customer can be either a Person or an Organisation, then a supertype-subtype structure is required to support that correctly.
So what this really is, the technically accurate terms, is a bunch of flat files, with descriptions for groups of fields. Light years distant from a database or a relational one. Not ready for evaluation or inspection, let alone building something with. In a Relational Data Model, that would be approximately 35 normalised tables, with no duplicated columns.
Barry has (wait for it) over 500 "schemas" on the web. The moment you try to use a second "schema", you will find that (a) they are completely different in terms of use and purpose (b) there is no commonality between them (c) let's say there was a customer file in both; they would be different forms of customer files.
He needs to Normalise the entire single "schema" first,
then present the single normlaised data model in 500 sections or subject areas.
I have written to him about it. No response.
It is important to note also, that he has used some unrecognisable diagramming convention. The problem with these nice interesting pictures is that they convey some things but they do not convey the important things about a database or a design. It is no surprise that a learner is confused; it is not clear to experienced database professionals. There is a reason why there is a standard for modelling Relational databases, and for the notation in Data Models: they convey all the details and subtleties of the design.
There is a lot that Barry has not read about yet: naming conventions; relations; cardinality; etc, too many to list.
The web is full of rubbish, anyone can "publish". There are millions of good- and bad-looking "designs" out there, that are not worth looking at. Or worse, if you look, you will learn completely incorrect methods of "design". In terms of learning about databases and database design, you are best advised to find someone qualified, with demonstrated capability, and learn from them.
Answer
He is using composite keys without spelling it out. The PK for client_addresses is client_id, address_id, date_address_from). That is not a bad key, evidently he expects to record addresses forever.
The notion of keeping addresses in a separate file is a good one, but he has not provided any of the fields required to store normalised addresses, so the "schema" will end up with complete duplication of addresses; in which case, he could remove addresses, and put the lines back in the client and staff files, along with their other_details, and remove three files that serve absolutely no purpose other than occupying disk space.
You are thinking about Associative Tables, which resolve the many-to-many relations in Databases. Yes, there, the columns are only the PKs of the two parent tables. These are not Associative Tables or files; they contain data fields.
It is not the PK, it is the third element of the PK.
The notion of a person being registered at more than one address in a single day is not reasonable; just count the one address they slept the most at.
Others have answered that.
Do not expect to identify any evidence of databases or design or Normalisation in this diagram.
1) In each of those tables the primary key is a compound key consisting of three attributes: (staff_id, address_id, date_address_from) and (client_id, address_id, date_address_from). This presumably means that the mapping of clients/staff to addresses is expected to change over time and that the history of those changes is preserved.
2) There's no obvious reason to create a new "id" attribute in those tables. The compound key does the job adequately. Why would you want to create the same address twice for the same client on the same date? If you did then that might be a reason to modify the design but that seems like an unlikely requirement.
3) No. The apparent purpose is that they are the applicable dates for the mapping of address to client/staff - not dates applicable to the address alone.
3) Along the lines of
normalization...Since both
date_address_from and date_address_to
are the same in the Client_Addresses
and Staff_Addresses table should those
fields just not be included in the
main Address table?
No. But you did find a problem.
The designer has decided that clients and staff are two utterly different things. By "utterly different", I mean they have no attributes in common.
That's not true, is it? Both clients and staff have addresses. I'm sure most of them have telephones, too.
Imagine that someone on staff is also a client. How many places is that person's name stored? That person's address? Can you hear Mr. Rogers in the background saying, "Can you spell 'update anomaly'? . . . I knew you could."
The problem is that the designer was thinking of clients and staff as different kinds of people. They're not. "Client" describes a business relationship between a service provider (usually, that is, not a retailer) and a customer, which might be either a person or a company. "Staff" describes a employment relationship between a company and a person. Not different kinds of people--different kinds of relationships.
Can you see how to fix that?
This 2 extra tables enables you to have address history per one person.
You can have them both in one table, but since staff and client are separated, it is better to separate them as well (b/c client id =1 and staff id =1 can't be used on the same table of address).
there is no "single" solution to a design problem, you can use 1 person table and then add a column to different between staff and client. BUT The major Idea is that the DB should be clear, readable and efficient, and not to save tables.
about 2 - the pk is combined, both clientID, AddressID and from.
so if someone lives 6 month in the states, then 6 month in Israel, and then back to the states, to the same address - you need only 2 address in address table, and 3 in the client_address.
The idea of heaving the from_Date as part of the key is right, although it doesn't guaranty data integrity - as you also need manually to check that there isn't overlapping dates between records of the same person.
about 3 - no (look at 2).
Viewing the data model, i think:
1) PF means that the field is both part of the primary key of the table and foreign key with other table.
2) In the same way, the primary key of Staff_Addresses is {staff_id,address_id,date_adderess_from} not just date_adderess_from
3) The same that 2)
In reference to Staff_Addresses table, the Primary Key on date_address_from basically prevents a record with the same staff_id/address_id entered more than once. Now, i'm no DBA, but i like my PKs to be integers or guids for performance reasons/faster indexing. If i were to do this i would make a new column, say, Staff_Address_Id and make it the PK column and put a unique constraint on staff_id/address_id/date_address_from.
As for your last concern, Addresses table is really a generic address storage structure. It shouldn't care about date ranges during which someone resided there. It's better to be left to specific implementations of an address such as Client/Staff addresses.
Hope this helps a little.