Best practices for designing databases with addresses - database

I was studying a design case in which they discuss how to deal with the design of the database of a system that is already under development. The system is for for managing public parking. The main debate is how to meet the following requirements:
The system shall allow managing information of each parking company.
The data stored is the parking lot's adress (details, province, county, district), name, corporate identification, number of spaces.
The system should handle reports of the revenue generated by each parking.
These reports should allow filtering by province, as the company has several parking by province, some of them located in the same district.
Some people have mentioned that the table ParkingLot must have the colums for province, county and district, and which are characteristic of each parking.
Others say it should not be. I was wondering which was the best way to approach the design of the database in this specific point.
Which boils down to my questions, in this type of cases (because there must be exeptions) where the system is pretty straight forward, does it really matter?
What if these reports should allow filtering by province? As a company could have several parking lots in the same province, some of them located in the same district.

Based on your requirements, you need to decide what to model.
Option 1 - Columns on ParkingLot
If you add columns for province, county and district to the ParkingLot table, then you are modelling that each parking lot has these things.
Your model in this case does not include a master list of districts, county and provinces. You would not be able to validate that the details entered for a particular lot was valid.
Choose this option if you obtain the parking lots from a trusted source or you district, etc. This option doesn't allow you to store any additional information. e.g. if you need to store a "area manager" against a particular region you would have nowhere to store it.
Option 2 - Model Districts, County and Provinces
You could create tables for province, county and district. These would have foreign keys to their parents. In this case you do have a master list (which you need to keep up to date).
Then on each parking lot you have a foreign key to the district (which implies the other columns).
Choose this option if you want to validate district, etc. against a master list. Also choose this if you have extra information to store against district, etc.

Related

Common identifier for a person across disparate systems

Not sure which is the best Stack Exchange site for this, so will try my hand here.
I have a web application that stores user disciplinary data for organisations. Rather than clients enter their staff into multiple systems, some want to push the basic personnel data into ours (data such as First Name, Surname, DOB, Job Title etc) from their source (e.g. HR/ERP) databases.
Our clients are using a range of existing systems to store their data, such as Oracle, SAP, JD Edwards, etc.
I am familiar with the technical methods to get this data (e.g. web service, web API), but not for a case such as when a person's surname changes (e.g. Janet Smith gets married and becomes Janet Doe). Unless there is a unique identifier for that person across both systems, I can't see how that change can be managed reliably.
How is this process best-managed please? Is an additional field added to the destination database that contains the UID of the source data? Or, do both parties agree on a common field, e.g. employee number, that never changes?
This issue arises in many circumstances. One case is in the Texas school system where students are tracked longitudinally through numerous education subsystems. A social security number, providing a unique identifier in some cases (although not all) was considered too sensitive for use. Thus, a unique identifier has been generated for each student and staff member. This is part of the permanent information associated with each individual, regardless of employment change, location change, or name change.
This link describes the rationale for the unique id.
This link is the documentation on the Texas Student Data System (TSDS) unique identifier. You might find the XML examples at the end of the document of most interest. Much of the information involves submitting requests for an id where demographic information is needed for disambiguation.
Basically, something similar to a Java UUID as an extra field in the database should be sufficient to achieve your aim.
Hope this helps.
Yes, the UID is the only solution. This problem comes up in medical systems too, for example. Another is photos, I'm not sure which causes more problems!
I know approach which is using "external_id" field for that. Several external ids can be exploited in case of many systems.

SQL - Designing a Phone book database with Hierarchical model (master-client)

I just joined this site and this is my first question , I hope my question it's according to the StackOverflow question policy.
I'm designing a DB for Phone book which has the following abilities
Contact have 2 types (Company or Person)->ContactType
And I want each contact to have as many Emails, Phones Numbers, and Addresses as it wants.
And I want to specify which Person works in which Company , so I can show not only a Company Contact detail but also list of its employees and their jobs in that Company and their Contacts (CoEmpJob table)
I have designed a db diagram which is shown in the link below, is it well structured or can I achieve what I want in some better way?
Thanks in advance.
My Phone Book Design
As the design stands, you're missing a few things, such as a Companies table and a ContactTypes table. There seems to be no foreign key in the CoEmpJob table linking to the Contacts table.
In the Phones table, I personally wouldn't use a prefix field (unless you wish to display contacts by phone prefix), in which case every phone number is guaranteed to be unique, in which case the PhoneNum field becomes the primary key and the PhoneID field is unnecessary - but you might have the case in which husband and wife are in the same database; whilst they almost certainly have different mobile numbers, they almost certainly share the same home phone number! In this case, your design is correct.
I don't know how many people have more than one address (I would think very few, if at all) which means that the fields of the Address table could be moved into the Contacts table.
(Added)
As regards the companies, if you want to specify which Person works in which Company, then you will need a companies table (missing) and a join table (CoEmpJob). In the real world, this design would also require more tables - a join table can show which contacts are connected to which companies and what their current jobs are, but people change jobs (and companies) and so such a design would not store any history. Also, it is customary to link people (employees) to a department - and it is possible that one person can be connected to more than one department at a time, meaning that you will need another join table. This can get very complicated - it depends on what you want.
Your comment suggests that you want to store company data in the contacts table - this is a very bad idea; they should be kept separate.

Why use a 1-to-1 relationship in database design?

I am having a hard time trying to figure out when to use a 1-to-1 relationship in db design or if it is ever necessary.
If you can select only the columns you need in a query is there ever a point to break up a table into 1-to-1 relationships. I guess updating a large table has more impact on performance than a smaller table and I'm sure it depends on how heavily the table is used for certain operations (read/ writes)
So when designing a database schema how do you factor in 1-to-1 relationships? What criteria do you use to determine if you need one, and what are the benefits over not using one?
From the logical standpoint, a 1:1 relationship should always be merged into a single table.
On the other hand, there may be physical considerations for such "vertical partitioning" or "row splitting", especially if you know you'll access some columns more frequently or in different pattern than the others, for example:
You might want to cluster or partition the two "endpoint" tables of a 1:1 relationship differently.
If your DBMS allows it, you might want to put them on different physical disks (e.g. more performance-critical on an SSD and the other on a cheap HDD).
You have measured the effect on caching and you want to make sure the "hot" columns are kept in cache, without "cold" columns "polluting" it.
You need a concurrency behavior (such as locking) that is "narrower" than the whole row. This is highly DBMS-specific.
You need different security on different columns, but your DBMS does not support column-level permissions.
Triggers are typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do. For example, Oracle doesn't let you modify the so called "mutating" table from a row-level trigger - by having separate tables, only one of them may be mutating so you can still modify the other from your trigger (but there are other ways to work-around that).
Databases are very good at manipulating the data, so I wouldn't split the table just for the update performance, unless you have performed the actual benchmarks on representative amounts of data and concluded the performance difference is actually there and significant enough (e.g. to offset the increased need for JOINing).
On the other hand, if you are talking about "1:0 or 1" (and not a true 1:1), this is a different question entirely, deserving a different answer...
See also: When I should use one to one relationship?
Separation of duties and abstraction of database tables.
If I have a user and I design the system for each user to have an address, but then I change the system, all I have to do is add a new record to the Address table instead of adding a brand new table and migrating the data.
EDIT
Currently right now if you wanted to have a person record and each person had exactly one address record, then you could have a 1-to-1 relationship between a Person table and an Address table or you could just have a Person table that also had the columns for the address.
In the future maybe you made the decision to allow a person to have multiple addresses. You would not have to change your database structure in the 1-to-1 relationship scenario, you only have to change how you handle the data coming back to you. However, in the single table structure you would have to create a new table and migrate the address data to the new table in order to create a best practice 1-to-many relationship database structure.
Well, on paper, normalized form looks to be the best. In real world usually it is a trade-off. Most large systems that I know do trade-offs and not trying to be fully normalized.
I'll try to give an example. If you are in a banking application, with 10 millions passbook account, and the usual transactions will be just a query of the latest balance of certain account. You have table A that stores just those information (account number, account balance, and account holder name).
Your account also have another 40 attributes, such as the customer address, tax number, id for mapping to other systems which is in table B.
A and B have one to one mapping.
In order to be able to retrieve the account balance fast, you may want to employ different index strategy (such as hash index) for the small table that has the account balance and account holder name.
The table that contains the other 40 attributes may reside in different table space or storage, employ different type of indexing, for example because you want to sort them by name, account number, branch id, etc. Your system can tolerate slow retrieval of these 40 attributes, while you need fast retrieval of your account balance query by account number.
Having all the 43 attributes in one table seems to be natural, and probably 'naturally slow' and unacceptable for just retrieving single account balance.
It makes sense to use 1-1 relationships to model an entity in the real world. That way, when more entities are added to your "world", they only also have to relate to the data that they pertain to (and no more).
That's the key really, your data (each table) should contain only enough data to describe the real-world thing it represents and no more. There should be no redundant fields as all make sense in terms of that "thing". It means that less data is repeated across the system (with the update issues that would bring!) and that you can retrieve individual data independently (not have to split/ parse strings for example).
To work out how to do this, you should research "Database Normalisation" (or Normalization), "Normal Form" and "first, second and third normal form". This describes how to break down your data. A version with an example is always helpful. Perhaps try this tutorial.
Often people are talking about a 1:0..1 relationship and call it a 1:1. In reality, a typical RDBMS cannot support a literal 1:1 relationship in any case.
As such, I think it's only fair to address sub-classing here, even though it technically necessitates a 1:0..1 relationship, and not the literal concept of a 1:1.
A 1:0..1 is quite useful when you have fields that would be exactly the same among several entities/tables. For example, contact information fields such as address, phone number, email, etc. that might be common for both employees and clients could be broken out into an entity made purely for contact information.
A contact table would hold common information, like address and phone number(s).
So an employee table holds employee specific information such as employee number, hire date and so on. It would also have a foreign key reference to the contact table for the employee's contact info.
A client table would hold client information, such as an email address, their employer name, and perhaps some demographic data such as gender and/or marital status. The client would also have a foreign key reference to the contact table for their contact info.
In doing this, every employee would have a contact, but not every contact would have an employee. The same concept would apply to clients.
Just a few samples from past projects:
a TestRequests table can have only one matching Report. But depending on the nature of the Request, the fields in the Report may be totally different.
in a banking project, an Entities table hold various kind of entities: Funds, RealEstateProperties, Companies. Most of those Entities have similar properties, but Funds require about 120 extra fields, while they represent only 5% of the records.

Supertype/subtype database schema question

I have a database of resources with the typical address, email and all that jazz. One resources can be used by one or more counties. The resources are categorized by Education, Health Care and a couple others. A resource will only ever have one category so it cannot be education and health care, for instance. I would like to use the supertype/subtype relationship. Currently, no category (health care, education etc.) do not have any differing attributes. How could I amend my schema to accommodate that?
below is a screen cap of my current schema.
http://imgur.com/fbrFB
The whole point of a supertype/subtype structure is to gather the attributes common to all subtypes together in one table, the supertype, and to isolate the attributes unique to each subtype in separate tables.
If all your subtypes have the same attributes, what's the point?
I think you'll get more benefit from reconsidering how you're going to handle addresses. Anyone who has a PO box for mail is liable to have different ZIP codes for their physical and mailing addresses.

Database Design: one table against three tables

I have the following situation and would like to know your opinion and inputs.
My applications has a company. Each company has a lot of departments, which in turn has a lot of users.
I have a calendar at all the levels. So there is a central calendar for the company, and separate calendar for each departments, and separate calendar for each user. Idea is when the user is interested in an event that is in the company, he/she can add it to their calendar.Now,I need to have one or many tables for events.I was thinking whether
I should have one table, which will have a field to uniquely identify each
entity(company,department,user) and depending on who is querying, I can retrieve the results
accordingly
I should have multiple tables. One table for company, One table for departments, One table for
users.
So it is more like having one table against three tables.
Look at the application requirements - if the events for the different levels are essentially the same (have the same data requirements, behavior), you should probably just use the one table. If they are going to be different, use different tables.
I don't think events need to know if they belong to the company, department of user directly. I would suspect that events belong to a calendar and a calendar belong to the company, department or user?
So a Calendar table:
CALENDAR_ID
And then an event table
EVENT_ID
And an "EVENT_TO_CALENDAR" table (for the purposes of a many to many relationship:
CALENDAR_ID
EVENT_ID
If a user can see the event in the company calendar, they can say "add it to mine" that will create a new record in the EVENT_TO_CALENDAR table, with the same EVENT_ID but their unique CALENDAR_ID. The event is now linked to each calendar (the company's and the user's).
I think I would argue for three tables. First of all, if you choose one table, it will get three nullable foreign keys. This means you cannot guarantee consistency just from the database model but you have to guarantee it somewhere in your business logic. Second, as time passes you will very likely find out that a company calendar is slightly different than an employee or a department calendar. The latter may require an additional column, for example. You just cannot predict this.

Resources