creating a address database - sql-server

I am re-creating a part of my company’s database because it does not meet future needs.
Currently we have mainly a flat file and some disjoined tables that were never fully realized.
My way of thinking is we have a table for each category except maybe the zips table, which may serve as a connect it all together table.
Please refer to image below:
Database Diagram http://www.freeimagehosting.net/uploads/248cc7e884.jpg
One thing I am thinking of is removing the zip table and just putting the zip code in the zipstocities table since the zip code is almost unique and then indexing the table on the zip code. The only downside is zip code has to be a varchar to take care of zip codes with leading zeros. Just want to know if there is a flaw in my logic.

I don't know the US ZIPcode and territorial devision system well, but I assume it's somewhat like the German one.
A state has many counties.
A county has many cities.
A city has many zip codes.
Hence I would use the following schema.
ZipCodes CityZipCodes
------------ ---------------- Cities
ZipCode (PK) <─── ZipCode (PK)(FK) -----------
City (PK)(FK) ───> CityId (PK)
Name
County (FK) ───┐
│
│
Counties │
------------- │
States CountyId (PK) <───┘
----------------- Name
StateId (PK) <─── State (FK)
Name
Abbreviation
Fixed for multiple cities per ZIP code.

One thing you should be aware of is that not all cities are in counties. In Virginia you are in either a city or county but never both.

Looking at the diagram you have, the state table is the only one of the 4 outside tables that is really necessary. Lookup tables with just an ID and a single value aren't worth the effort. These relationships are designed to make a single value in the main table (ziptocities) refer to a set of related data in the lookup table (states).

You'll need to ask yourself why you care about counties. In many states in the US, they have little importance beyond tradition and maps.
The other question will be how important will it be that the address be accurate? How many deaths will there be if important letters are not delivered in a timely manner (possibly many if the letter is about prescription drug recalls!)
You probably want to think about using data from the Postal Service, possibly using a product that corrects addresses. That way, when you get a good address, you'll be certain the mail can be delivered there - because the Postal Service will have said so!

There seem to be flaws in both your process and your logic.
I suggest that you stop thinking about tables and relationships for a moment. Instead, think about facts. Make a list of valid addresses that your database needs to support. Many surprises await you.
Don't confuse an address with a mailing label. They're not at all the same thing. Consider modeling carriers, too. In the US, whether an address is valid depends on the carrier. For example, my PO box is a valid address when the carrier is the USPS, but not when the carrier is UPS.
To save time, you might try browsing some international address formats on bitboost.

Will your logic work if two countries happen to have the same zip code? These two would be pointing to different cities in that case. here are some points to consider
Do you want to use zipcode as a kind
of primary key into address? (at
lease the city, state and country
fields). In that case, you can have
zipcode, city,state,country in one
table. Create indexes on city, state
etc.. (you have a functional
dependency of the form
zipcode->country,state,city . This
as i said may not be true across
countries.
If auto populating is
your only concern, create a
materialized view and use it.
I would recommend reading 'Data Model patterns' by David C. Hay.

But not every person who has a valid medical claim is required by law to remain in the US until the claim is settled. People move.

San Francisco is a city in California; it's not a city in Alabama. Does your design prevent nonsense entries like "San Francisco, AL"?

Related

Does City and Zip in Address Table Violate BCNF

Let's say you have a the following
---------------------------
attribute constraints
| -------- -----------
| id (PK)
| location_name **
| street
| zipcode
| city
** realistically Unique, but not going to use for future proofing?
Would this violate BCNF as zipcode can be used to find the city? Although cities can share zipcodes and vise versa, a city can't be in two separate zipcodes where another city is part of that zipcode?
(zipcode1 --> city1) and (zipcode2 --> city1 and city2)
(note that zipcode and city are not a composite superkey as multiple locations can be associated with the same zipcode and city). Is BCNF suggesting that you should a completely separate table JUST for pairing cities and zipcodes?
States are omitted because this database is for a single state. Although in that case would you have to have 3 tables since a zipcode cannot be in multiple states (edit: apparently there are, but assuming there aren't). Seems too dumb to me true and that wayy too many unions would be needed.
I honestly dont understand much of anything regarding key terms and have just been left confused (if you could answer in layman's terms and/or technically that's highly appreciated). I tried searching for an answer because I figured it would be common, but couldn't find anything. Given my inability to organize and process mathematical logic, i'm starting to wonder if I picked the wrong field to enter..
What does a five digit zipcode actually determine? As I understand it, it determines a Post Office. This is enough to route every piece of mail from wherever it is to a destination post office. That post office then deliver it locally.
Figuring out what the dependencies are between zip code and state or zip code and city, or zip code and street plus number or apartment number can be the devil's own business.
The area served by a post office is generally part of some community that the Post Office is in, like a town. But there are quirks.
The residents of Magalloway, ME are served by the post office in nearby Errol, NH. They therefore use zipcode 03579, the same as the residents of downtown Errol. The letters get forwarded to the Errol post office, then delivered to them in Maine. This may seem very strange, but it works out well in terms of driving miles.
map of 03579

Relation Between Pharmacist and Patient in access 2016

I am creating a Pharmacy Database in Access 2016. It is my school Project and first Database Project.
My first problem is that we know that a Pharmacist can have many Patients, so it means that the relationship between Pharmacist and Patient is one-to-many. So in order to create a one-to-many relation, I made Pharmacist_ID as Primary Key.
Now the problem is that we know that the relation of Address and Patient is one-to-one, so how can I accomplish this task?
Another problem is that I already have the address, the city and nationality which are linked with the Pharmacist_ID. Can I link these tables with Patient_ID?
I am confused because the data-type of Pharmacist_ID is Auto-Number. The Patient_ID of the first Patient will be 1 and then Pharmacist_ID of the first Pharmacist will also 1 so what will happen?
Again, I am on MS-Access 2016.
This is the Picture of The RelationShip and you can see the Details of my Tables
Regards,
Arslan Iftikhar
This is for Thomas G check it out Thomas do you think I am doing right or wrong
I will make below changes to Address table:
I will prefer creating one common table for Address which also has City and Nationality (for simplicity else link them like image 2 below)
Added field PID as Number where you can save Pharmacist or Patient ID
Added field Ptype as Number where to save value 1 when Pharmacist and 2 when patient, so we can easily differentiate using this field.
Image 1
Image 2
A few design mistakes in your approach.
I will list a few things I think about, and try to train you to raise the right questions.
My first problem is that we know that a Pharmacist can have many
Patients, so it means that the relationship between Pharmacist and
Patient is one-to-many
The first part is only partially correct, which makes the second part incorrect and might lead to a big design failure.
In a normal world:
a Pharmacist can have many Patients
a Patient can have many Pharmacists
Isn't it ?
Thus you have a m-to-m relationship. How do you solve this? With an intermediate table storing the relation between patients and pharmacists.
The only exception to this, is if you make your software for only one Pharmacist, then your 1-to-m approach will work, but then I dont see any reason to have a Phamarcist table :)
Now the problem is that we know that the relation of Address and
Patient is one-to-one, so how can I accomplish this task?
Are you really sure about this? What happens in such cases then:
Patients members of the same family (living in the same house).
Patients for which the billing address is not the same as home address.
Patients that are also Pharmacists.
Patients owning several houses.
Those are very common cases. If you go the 1-1, you'll end up with A LOT of doubles in your address table.
The REAL reason for which we almost always put addresses in seperate table, is that adresses are rarely one-to-one in information systems. If it was one-to-one, there would be no real reason to store them in additional tables.
I am confused because the data-type of Pharmacist_ID is Auto-Number.
The Patient_ID of the first Patient will be 1 and then Pharmacist_ID
of the first Pharmacist will also 1 so what will happen?
And that is a good question, which should have led you to the design mistake above. You should not have a 1-1 between address and something else (Patient or Pharmacist).
In your Patient AND Pharmacists tables, you should have an AddressID refering the ID in the address table. If you want to let the opportunity to the pharamacists to store:
Home Address
Billing Address
Holiday Address
Whatever additional address
You should either :
Create an AddressID field in your Patient (and eventually Pharmacist table) for each of the address types.
If you really have many types for your addresses, it's better to create an intermediate table handling m-to-m between address and Patient/Pharmacists, with at least a Type column in it.
Edit. Reactions on your new model.
I guess the point of your CITY table is to have a big list of cities? That you can use for comboboxes for instance? If you go that way, you might do the same for countries, or states/regions. That's fine, BUT : the city_id, (and eventually State_ID and Country_ID), should be part of the ADDRESS table. It doesnt make sense to have an address with only a street, house number and po box. That's incomplete, an address should also contain a zip code, city and country to be complete.

Extendable database schema for contacts (social)

I have an old application that needs upgrading. Doesn't everything now days?
The existing DB schema consists of predefined fields like phone, fax, email. Obviously with the social explosion over the last 5-7 years (or longer depending on your country) end users need more control over creating contact cards the way they see fit rather than just what I think might be useful.
Im concerned here with "digital" addresses. i.e. One line type addresses. phone=ccc ccc ccc ccc etc
Since physical addresses are pretty standard in terms of requirements in this case users will have to use what they are given (location, postal, delivery) in order to keep the scope managable.
So I'm wondering what the best practice format for storing digital info is. To me it seems I have two choices:
A simple 4 field table (ContactId, AddressTypeId, Address, FormatterId)
1000, "phone", "ccc ccc ccc ccc", phoneformatter
1000, "facebook", "myfacebook", facebookformatter
This would then be JOINED anywhere it's need. The table would get massive though and the join performance would degrade over time i suspect.
A json blob that would require additional processing once read (ContactId, Addresses)
1000, {{"phone": "ccc ccc ccc ccc"}, {"facebook": "myfacebook"}}
Or ... something else.
This db is for use in a given country by customers only trading domestically with client bases ranging from 3000-12000 accounts and then however many contacts per account - averages about 10 in current system.
My primary concern is user flexibility but performance is a key consideration in that. So I dunno, just do whatever and throw heaps of hardware at it ;)
Application is in C# if that makes any difference re: post query processing.
I would not go for the JSON blob. This will be nasty if you need to answer any queries like:-
Does anyone have me in their Facebook contacts?
What's the most popular type of social media contact?
You would be forced to parse the JSON for every record and be unable to create a simple index.
Your additional solution is nearly correct, however FormatterId would need to be on a AddressType table. What you have is not normalised as FormatterId would depend only on AddressTypeId. So you would have three tables:-
Contact
ContactAddress
AddressType
You haven't stated if you need to store two addresses of the same type against a single contact. e.g. if someone has two twitter accounts. Answering this question will allow you to define the correct primary key on ContactAddress. It would either be (ContactId, AddressTypeId) if you can only have one of each type per contact or create a synthenic key (ContactAddressId).
Well, I believe you have a table named contact
contact(contactid, contact details, other details)
and now you want to remove this contact details from the contact table because the contact details may contain digital address, phone number and all.
But the table you are considering
(ContactId, AddressTypeId, Address, FormatterId) is not in normal form and you can't uniquely identify a tuple until you read all the four columns which is bad and in this case indexing also not going to help you.
So better if you have if separate table for each type of the digital address, and have indexing on contactID
facebookdetails(contactid, rest of the details)
phonedetails(contactid, rest of the details)
And then the query can be join of all the tables, it will not degrade the performance.
Hope this will help :)

How far does one go to eliminate duplicate data in a database?

How far does one go to eliminate duplicate data in a database? Because you could go OTT and it would get crazy. Let me give you an example...
If I were to create a Zoo database which contains a table 'Animal' which has a 'name', 'species' and 'country_of_birth'
But there will be duplicate data there as many animals could come from same country and there could be lots of tigers, for example.
So really there should be a 'Species' table and a 'Country_of_birth' table
But then after a while you would have tons of tables
So how far do you go?
In this question I am just using one table as an example. One row in the Animal table stores information about a single animal in the zoo. So that animal's name, species and country of birth, as well as a unique animalID.
But there will be duplicate data there as many animals could come from
same country and there could be lots of tigers, for example.
This suggests you want to keep track of individual animals, not just kinds of animals. Let's assume that the zoos use some kind of numeric tattoo or microchip to identify individual animals.
Assume this sample data is representative. (It's not, but it's ok for teaching.)
Animals
Predicate: Animal having microchip <chip_num> of species <species>
has name <name> and was born in <birth_country_code>.
chip_num name species birth_country_code
--
101234 Anita Panthera tigris USA
101235 Bella Panthera tigris USA
101236 Calla Panthera tigris USA
101237 Dingo Canis lupus CAN
101238 Exeter Canis lupus CAN
101239 Bella Canis lupus USA
101240 Bella Canis lupus CAN
There's no redundant data in that table. None of those columns can be dropped without radically changing the meaning of that table. It has a single candidate key: chip_num. It's in 5NF.
Values are repeated in non-key columns. That's kind of the definition of non-key (non-prime) columns. Values in key columns (or sets of key columns) are unique; values in non-key columns aren't.
If you want to restrict the values in "birth_country_code" to the valid three-letter ISO country codes, you can add a table of valid three-letter ISO country codes, and set a foreign key reference to it. This is generally a Good Thing, but it has nothing to do with normalization.
iso_country_code
--
CAN
USA
You could do the same thing again for "species". That, too, would generally be a Good Thing, and it, too, would have nothing to do with normalization.
First you decide What the table is supposed to carry information about. In your example. is the table about individual animals? or is it about species of animals and how many of each species? The fact that you have country of birth might be an indicator that someone wants it to be the former. If that is the case you must have a key that identifies individual animals. You have an attribute, (a property) that is associated with individuals, so each row must (should?) represent an individual. You should read up Here on the database modeling concepts of Identity, and Individuation.
And to do this properly, actually, you do this for each thing in your data model, and then convert that model into database tables.
It comes down to deciding what is important to your system.
Deciding whether something (your e.g. "country of birth") is merely an attribute or is instead a full-blown entity in its own right depends on what else your system needs to know about countries and how many attributes your system may track that are fully functionally dependent on the country.
You should also consider whether your attributes are susceptible to update anomalies. If your statement of country in the animal table is in the form of the full official name of the country, then you might be at risk if, for example, "The Belgian Congo" suddenly becomes "The Democratic Republic of the Congo" - oh wait, that already happened!
The rules of normalization are not sacrosanct. They are pretty darn useful rules of thumb that are intended to keep you out of trouble, most of the time. Still, rules are made to be broken - but you should only break them knowingly and with a carefully considered understanding of the consequences.

Data constraints in US state/county/city/zip database?

I have a database of US zip codes and their corresponding states, cities and counties. It was supplied as a flat file and I'm trying to normalize the data and figure out exactly which entities depend on which others.
One problem I've come across is that some cities seem to exist in more than one county. I was under the impression that in the US, there is a hierarchy of State -> County -> City -> Zip.
However, this data seems to show otherwise for some cities:
Is my data set incorrect or is this actually a feature of US geography?
I am working with this same topic. I have learned that Virgina has cities that are not within a county. The city functions as both a city and county but in not within any county boundary. Also Alaska has no counties. Their equivilant is Boroughs, but the whole state is not divided into boroughs. Any area not within a borough is referred to as the "unorganized borough".
No, there isn't a clean hierarchy like that.
You're also liable to find cities that straddle state borders (cities in two states), and ZIP codes that take in more than one city. Not long ago, there were ZIP codes that straddled state borders, too. (ZIP codes are more about the route followed to deliver mail than about geography.) There might still be some.
As far as I know, no county is split between two states. But if there happened to be one, it wouldn't surprise me.
Depending on your application, you might discover even weirder things. I used to have to deal with addresses in the mountains that were "in" one county geographically, but were "in" a second county for emergency services (fire, police), and "in" yet a third county for non-emergency services (water, sewer, garbage collection). It depended on where the address was in relation to mountain ridges and roads.

Resources