City <-> Zipcode database design

City <-> Zipcode database design - database

I need to create a database containing all cities and zip codes worldwide. For that I want to create a table 'city' and a table zip_code. My question is, how is the relation between city and zip code, is it worldwide an 1:n relationship or can it also be m:n in some countries?

It differs. In Holland we got multiple zipcodes per street, most of the time. The zip codes are so fine grained (consisting of 4 digits + 2 letters), that just the zipcode and the address house number is enough to uniquely identify a building.
In Belgium, though, there's a 4 digit postal code, and a couple of towns can have the same zipcode, while it is still possible that a larger city has multiple zipcodes. It can even happen that a city has multiple zipcodes, while each (or some) of them are shared with some smaller towns as well.
So I would almost say that there is not relation between zip code and city, or at least no one definition that works on a global scale. If you would store it in a database, then it's definately a m:n relation.

For the US at least it is a M:M. A city can have multiplle zips, and a zip can cover more than one city, (Obviously depending on how loosly you define a city vs a town or a municipality).

Related

Does City and Zip in Address Table Violate BCNF

Let's say you have a the following
---------------------------
attribute constraints
| -------- -----------
| id (PK)
| location_name **
| street
| zipcode
| city
** realistically Unique, but not going to use for future proofing?
Would this violate BCNF as zipcode can be used to find the city? Although cities can share zipcodes and vise versa, a city can't be in two separate zipcodes where another city is part of that zipcode?
(zipcode1 --> city1) and (zipcode2 --> city1 and city2)
(note that zipcode and city are not a composite superkey as multiple locations can be associated with the same zipcode and city). Is BCNF suggesting that you should a completely separate table JUST for pairing cities and zipcodes?
States are omitted because this database is for a single state. Although in that case would you have to have 3 tables since a zipcode cannot be in multiple states (edit: apparently there are, but assuming there aren't). Seems too dumb to me true and that wayy too many unions would be needed.
I honestly dont understand much of anything regarding key terms and have just been left confused (if you could answer in layman's terms and/or technically that's highly appreciated). I tried searching for an answer because I figured it would be common, but couldn't find anything. Given my inability to organize and process mathematical logic, i'm starting to wonder if I picked the wrong field to enter..

What does a five digit zipcode actually determine? As I understand it, it determines a Post Office. This is enough to route every piece of mail from wherever it is to a destination post office. That post office then deliver it locally.
Figuring out what the dependencies are between zip code and state or zip code and city, or zip code and street plus number or apartment number can be the devil's own business.
The area served by a post office is generally part of some community that the Post Office is in, like a town. But there are quirks.
The residents of Magalloway, ME are served by the post office in nearby Errol, NH. They therefore use zipcode 03579, the same as the residents of downtown Errol. The letters get forwarded to the Errol post office, then delivered to them in Maine. This may seem very strange, but it works out well in terms of driving miles.
map of 03579

Why would the specifications for this database use an aggregation instead of attributes on an entity?

I'm trying to better understand designing a database schema. After reviewing the solution for a problem that I'm working on, I don't understand why the solution chooses to use an aggregation for the attributes "address" and "phone number" for a given "musician". Here are the specifications, I'm only interested in bullet point 1:
Each musician that records at Notown has an SSN, a name, an address, and a phone
number. Poorly paid musicians often share the same address, and no address has more
than one phone.
Each instrument used in songs recorded at Notown has a name (e.g., guitar, synthesizer,
flute) and a musical key (e.g., C, B-flat, E-flat).
Each album recorded on the Notown label has a title, a copyright date, a format (e.g.,
CD or MC), and an album identifier.
Each song recorded at Notown has a title and an author.
Each musician may play several instruments, and a given instrument may be played by
several musicians.
Each album has a number of songs on it, but no song may appear on more than one
album.
Each song is performed by one or more musicians, and a musician may perform a number
of songs.
Each album has exactly one musician who acts as its producer. A musician may produce several albums, of course.
Here is a solution that I found:
The ER Diagram I created looks almost exactly the same, except for the fact that I made "address" and "phone number" attributes of "musician" instead of giving each of them an entity set of their own, creating a relationship, and turning it into an aggregation. I don't understand why this would be done in this situation. Can anyone explain?? Thank you!

I'm not able to see the image you linked to, but anyway...
no address has more than one phone
This means we should make the phone number an attribute of the address - unless we want to allow for multiple phones per address in the future.
So it would not be completely wrong to make phones a table. But then, we know little about the future. Would there be multiple musicians sharing the same address and the same phones? (I.e. the phone number would be linked to an address.) Or would there be multiple musicians sharing the same address, but each would have their own phone? (I.e. the phone number would be linked to a musician. To use a phone table and link the phones to musicians, however, would only be necessary if a musician could have multiple phone numbers. Otherwise we'd still not make a phone table, but rather make the phone a musician's attribute.)
poorly paid musicians often share the same address
This means we make the address a table of its own. Thus there is only one row to change in case the phone number or some other attribute changes. If we made the address number a musician's attribute instead, we'd store the address redundantly and could get inconsistent data (e.g. same address, but different phone numbers).
A possible data model:
address (address_id, street, city, phone, ...)
musician (musician_id, ssn, name, address_id, ...)
This is a 1:n relation. A musician has one address; an address can belong to multiple musicians.

The primary purpose of database normalization is to make it more difficult for anomalous data to get into the database. Reading the first bullet point, we see that each address may have zero or one phone numbers associated with it. In other words, the phone number is an attribute of/identified by the address. Which normalization level does this violate?
To illustrate how not normalizing the address fields (including phone number) increases the chances of anomalous data, let's say you have four students staying at that address. This means you have four rows where the address data exists. Suppose the phone number changes. You have to make sure you change all four versions of the data. I said there were four students, but suppose there are actually five and I just missed one? Or suppose you found only three when you went to make the change? An address may have at most one phone number however now you have several copies of the same address but with different phone numbers. This is anomalous data.
If this data is normalized, you would have only one copy to change. Since this data is referenced by all the students who live there, no matter how many, this change is "propagated" to all of them. The integrity of the data is maintained.

Database Design - Franchise

I am trying to design a database that stores the following:
- Franchise Name (string)
- Employee Count (integer)
- Locations (zip-code)
The problem I am having is the potential size of the database (all potential franchises, as well as locations) and how to design the database so that the zip-codes for a specific franchise can be stored in the same cell. Is this possible to have a cell that has something like "95432, 12345, 92534, 68723" and can be queried and modified to add or delete zip-codes with a query? I want to be able to regularly do something like:
SELECT franchiseName
FROM database
WHERE zipCode = "12345"
and then obviously, display all franchises that have a location in that zip-code. I wouldn't want to create a separate tuple for each zip-code right? Wouldn't something like:
Pizza Hut, 23, 95432
Pizza Hut, 12, 12345
Pizza Hut, 07, 92534
Pizza Hut, 15, 68723
be considered bad design because of the potential number of duplicate Franchise names? (imagine McDonald's alone)
Any help is appreciated

Basically you're defining a many-to-many relationship. The common practice in this situation is to work with a table containing all the Locations and a mapping table. This mapping table contains the id of the franchise and one of the location. Using joins, you could get all the information you need and there would be no data duplication.

A de-normalized design would have a corporation holding the name, and associated franchises.
This also allows storing corporate info (e.g., headquarters info) common to all franchises.

You should split that information into two tables:
franchise(id,name, employee_count)
zipcode(franchise_id,zipcode)
If a 'location' contains more information than just a zip code you should put all locations in a separate table and have a mapping table between the franchises and the locations.

Data constraints in US state/county/city/zip database?

I have a database of US zip codes and their corresponding states, cities and counties. It was supplied as a flat file and I'm trying to normalize the data and figure out exactly which entities depend on which others.
One problem I've come across is that some cities seem to exist in more than one county. I was under the impression that in the US, there is a hierarchy of State -> County -> City -> Zip.
However, this data seems to show otherwise for some cities:
Is my data set incorrect or is this actually a feature of US geography?

I am working with this same topic. I have learned that Virgina has cities that are not within a county. The city functions as both a city and county but in not within any county boundary. Also Alaska has no counties. Their equivilant is Boroughs, but the whole state is not divided into boroughs. Any area not within a borough is referred to as the "unorganized borough".

No, there isn't a clean hierarchy like that.
You're also liable to find cities that straddle state borders (cities in two states), and ZIP codes that take in more than one city. Not long ago, there were ZIP codes that straddled state borders, too. (ZIP codes are more about the route followed to deliver mail than about geography.) There might still be some.
As far as I know, no county is split between two states. But if there happened to be one, it wouldn't surprise me.
Depending on your application, you might discover even weirder things. I used to have to deal with addresses in the mountains that were "in" one county geographically, but were "in" a second county for emergency services (fire, police), and "in" yet a third county for non-emergency services (water, sewer, garbage collection). It depended on where the address was in relation to mountain ridges and roads.

creating a address database

I am re-creating a part of my company’s database because it does not meet future needs.
Currently we have mainly a flat file and some disjoined tables that were never fully realized.
My way of thinking is we have a table for each category except maybe the zips table, which may serve as a connect it all together table.
Please refer to image below:
Database Diagram http://www.freeimagehosting.net/uploads/248cc7e884.jpg
One thing I am thinking of is removing the zip table and just putting the zip code in the zipstocities table since the zip code is almost unique and then indexing the table on the zip code. The only downside is zip code has to be a varchar to take care of zip codes with leading zeros. Just want to know if there is a flaw in my logic.

I don't know the US ZIPcode and territorial devision system well, but I assume it's somewhat like the German one.
A state has many counties.
A county has many cities.
A city has many zip codes.
Hence I would use the following schema.
ZipCodes CityZipCodes
------------ ---------------- Cities
ZipCode (PK) <─── ZipCode (PK)(FK) -----------
City (PK)(FK) ───> CityId (PK)
Name
County (FK) ───┐
│
│
Counties │
------------- │
States CountyId (PK) <───┘
----------------- Name
StateId (PK) <─── State (FK)
Name
Abbreviation
Fixed for multiple cities per ZIP code.

One thing you should be aware of is that not all cities are in counties. In Virginia you are in either a city or county but never both.

Looking at the diagram you have, the state table is the only one of the 4 outside tables that is really necessary. Lookup tables with just an ID and a single value aren't worth the effort. These relationships are designed to make a single value in the main table (ziptocities) refer to a set of related data in the lookup table (states).

You'll need to ask yourself why you care about counties. In many states in the US, they have little importance beyond tradition and maps.
The other question will be how important will it be that the address be accurate? How many deaths will there be if important letters are not delivered in a timely manner (possibly many if the letter is about prescription drug recalls!)
You probably want to think about using data from the Postal Service, possibly using a product that corrects addresses. That way, when you get a good address, you'll be certain the mail can be delivered there - because the Postal Service will have said so!

There seem to be flaws in both your process and your logic.
I suggest that you stop thinking about tables and relationships for a moment. Instead, think about facts. Make a list of valid addresses that your database needs to support. Many surprises await you.
Don't confuse an address with a mailing label. They're not at all the same thing. Consider modeling carriers, too. In the US, whether an address is valid depends on the carrier. For example, my PO box is a valid address when the carrier is the USPS, but not when the carrier is UPS.
To save time, you might try browsing some international address formats on bitboost.

Will your logic work if two countries happen to have the same zip code? These two would be pointing to different cities in that case. here are some points to consider
Do you want to use zipcode as a kind
of primary key into address? (at
lease the city, state and country
fields). In that case, you can have
zipcode, city,state,country in one
table. Create indexes on city, state
etc.. (you have a functional
dependency of the form
zipcode->country,state,city . This
as i said may not be true across
countries.
If auto populating is
your only concern, create a
materialized view and use it.
I would recommend reading 'Data Model patterns' by David C. Hay.

But not every person who has a valid medical claim is required by law to remain in the US until the claim is settled. People move.

San Francisco is a city in California; it's not a city in Alabama. Does your design prevent nonsense entries like "San Francisco, AL"?

Categories

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight