Does City and Zip in Address Table Violate BCNF - database

Let's say you have a the following
---------------------------
attribute constraints
| -------- -----------
| id (PK)
| location_name **
| street
| zipcode
| city
** realistically Unique, but not going to use for future proofing?
Would this violate BCNF as zipcode can be used to find the city? Although cities can share zipcodes and vise versa, a city can't be in two separate zipcodes where another city is part of that zipcode?
(zipcode1 --> city1) and (zipcode2 --> city1 and city2)
(note that zipcode and city are not a composite superkey as multiple locations can be associated with the same zipcode and city). Is BCNF suggesting that you should a completely separate table JUST for pairing cities and zipcodes?
States are omitted because this database is for a single state. Although in that case would you have to have 3 tables since a zipcode cannot be in multiple states (edit: apparently there are, but assuming there aren't). Seems too dumb to me true and that wayy too many unions would be needed.
I honestly dont understand much of anything regarding key terms and have just been left confused (if you could answer in layman's terms and/or technically that's highly appreciated). I tried searching for an answer because I figured it would be common, but couldn't find anything. Given my inability to organize and process mathematical logic, i'm starting to wonder if I picked the wrong field to enter..

What does a five digit zipcode actually determine? As I understand it, it determines a Post Office. This is enough to route every piece of mail from wherever it is to a destination post office. That post office then deliver it locally.
Figuring out what the dependencies are between zip code and state or zip code and city, or zip code and street plus number or apartment number can be the devil's own business.
The area served by a post office is generally part of some community that the Post Office is in, like a town. But there are quirks.
The residents of Magalloway, ME are served by the post office in nearby Errol, NH. They therefore use zipcode 03579, the same as the residents of downtown Errol. The letters get forwarded to the Errol post office, then delivered to them in Maine. This may seem very strange, but it works out well in terms of driving miles.
map of 03579

Related

RDBMS: How to model a company having different products at multiple locations

I have an existing database that models all products, a company is either producing or consuming. The database is quite simple:
Table: companies {PK: company_id}
+------------+--------------+
| company_id | company_name |
+------------+--------------+
Table: products {PK: product_id}
+------------+--------------+---------------+
| company_id | product_id | product_price |
+------------+--------------+---------------+
Now, if I need to add location information to it, it starts to get complicated.
Basically, now a company has many locations and each location has many products.
To further complicate matters, some attributes of the product e.g. price may not be the same at each location. I would like to share other common attributes at all locations (Basically, I want to avoid creating three copies of product A that's used at all three locations).
I'm not sure what the best way to model this is. I can think of
Table: company_location
+------------+-------------+
| company_id | location_id |
+------------+-------------+
Table: location_product
+-------------+------------+
| location_id | product_id |
+-------------+------------+
But this design would not allow product attributes to change per location, without creating an entirely different product for each location. I also don't have a way to maintain a master product list per company.
Any help is appreciated.
PS: I'm using a postgreSQL database
The rules of normalization would tell you that you need your non-key attributes to depend on all of the key values (and nothing else).
If price is determined by:
- The company who makes it
- The location that sells it
- What the product actually is
Then that implies that PRICE needs a candidate key that specifies company, location and production.
The issue becomes what the relationships are between companies, products and locations. Also, what else do you know (what columns do you have) about these three kinds of things?
If they are all totally independent, for example, the products are commodities and don't depend at all on companies and the locations are independent distributors, which have nothing to do with either companies or what kind of products are sold there, then really a single three-way join is probably your best bet.
However, if there are some linkages between company, product and location, then you need to normalize these items out appropriately. At the end, you may still find yourself tempted to keep price as the only attribute in a three-way join. Alternatively, you may find that your data is actually more hierarchical (companies have locations which sell products that are fundamentally different in some meaningful way from similar products sold at other locations). In such a case the price might live on the leaf level of a tree structure.
It's really hard to say for sure what would work best for you without understanding your business rules better.
The bottom line is, you should aim for third normal form (3NF).
You probably want something like this:

City <-> Zipcode database design

I need to create a database containing all cities and zip codes worldwide. For that I want to create a table 'city' and a table zip_code. My question is, how is the relation between city and zip code, is it worldwide an 1:n relationship or can it also be m:n in some countries?
It differs. In Holland we got multiple zipcodes per street, most of the time. The zip codes are so fine grained (consisting of 4 digits + 2 letters), that just the zipcode and the address house number is enough to uniquely identify a building.
In Belgium, though, there's a 4 digit postal code, and a couple of towns can have the same zipcode, while it is still possible that a larger city has multiple zipcodes. It can even happen that a city has multiple zipcodes, while each (or some) of them are shared with some smaller towns as well.
So I would almost say that there is not relation between zip code and city, or at least no one definition that works on a global scale. If you would store it in a database, then it's definately a m:n relation.
For the US at least it is a M:M. A city can have multiplle zips, and a zip can cover more than one city, (Obviously depending on how loosly you define a city vs a town or a municipality).

Data constraints in US state/county/city/zip database?

I have a database of US zip codes and their corresponding states, cities and counties. It was supplied as a flat file and I'm trying to normalize the data and figure out exactly which entities depend on which others.
One problem I've come across is that some cities seem to exist in more than one county. I was under the impression that in the US, there is a hierarchy of State -> County -> City -> Zip.
However, this data seems to show otherwise for some cities:
Is my data set incorrect or is this actually a feature of US geography?
I am working with this same topic. I have learned that Virgina has cities that are not within a county. The city functions as both a city and county but in not within any county boundary. Also Alaska has no counties. Their equivilant is Boroughs, but the whole state is not divided into boroughs. Any area not within a borough is referred to as the "unorganized borough".
No, there isn't a clean hierarchy like that.
You're also liable to find cities that straddle state borders (cities in two states), and ZIP codes that take in more than one city. Not long ago, there were ZIP codes that straddled state borders, too. (ZIP codes are more about the route followed to deliver mail than about geography.) There might still be some.
As far as I know, no county is split between two states. But if there happened to be one, it wouldn't surprise me.
Depending on your application, you might discover even weirder things. I used to have to deal with addresses in the mountains that were "in" one county geographically, but were "in" a second county for emergency services (fire, police), and "in" yet a third county for non-emergency services (water, sewer, garbage collection). It depended on where the address was in relation to mountain ridges and roads.

Database design issue:

I'm building a Volunteer Management System and I'm having some DB design issues:
To explain the process:
Volunteers can sign up for accounts. Volunteers report their hours to a project (each volunteer can have multiple projects). Volunteer supervisors are notified when a volunteers number of hours are close to some specified amount to give them a reward.
For example:
a volunteer who has volunteered 10 hours receives a free t shirt.
The problem I'm having is how to design the DB in such a way that a single reward profile can be related to multiple projects as well as have a single reward profile be "multi-tiered". A big thing about this is that rewards structures may change so they can't be just hardcoded.
Example of what I mean by "multi-tiered" reward profile:
A volunteer who has volunteered 10 hours receives a free t shirt.
A volunteer who has volunteered 40 hours receives a free $50 appreciation check.
The solutions I've come up with myself are:
To have a reward profile table that relates one row to each reward profile.
rewardprofile:
rID(primary key) - int
description - varchar / char(100)
details - varchar / file (XML)
Aside, just while on the topic, can DB field entries be files?
OR
To have a rewards table that relates one preset amount and reward where each row is as follows and a second rewards profile table that binds them the rewards entries together:
rewards:
rID(primary key) - int
rpID (references rewardsProfile) - int
numberOfHrs - int
rewardDesc - varchar / char(100)
rewardsprofile:
rpID(primary key) - int
description
so this might look something like:
rewardsprofile:
rpid | desc
rp01 | no reward
rp02 | t-shirt only
rp03 | t-shirt and check
rewards
rid | rpID | hours | desc
r01 | rp02 | 10 | t-shirt
r02 | rp03 | 10 | t-shirt
r03 | rp03 | 40 | check
I'm sure this issue is nothing new but my google fu is weak and I don't know how to phrase this in a meaningful way. I think there must be a solution out there more formalized than my (hack and slash) method. If anyone can direct me to what this problem is called or any solutions to it, that would be swell. Thanks for all your time!
Cheers,
-Jeremiah Tantongco
Yes, database fields can be files (type binary, character large object, or xml) depending on the implementation of the specific database.
The rewardsprofile table looks like it might be challenging to maintain if you have a large number of different rewards in the future. One thing you might consider is a structure like:
rewards:
rID(primary key) - int
numberOfHrs - int
rewardDesc - varchar / char(100)
volunteers:
vID(primary key) - int
.. any other fields you want here ..
rewardshistory:
vID (foreign key references volunteers)
rID (foreign key references rewards)
Any time you want to add a reward, you add it to the rewards table. Old rewards stay in the table (you might want an 'current' field or something to track whether the reward can still be assigned). The rewardshistory table tracks which rewards have been given to what volunteers.
This is a rough structure of how I would handle this:
Volunteers
volunteerid
firstname
lastname
VolunteerAddress
volunteerid
Street1
Street2
City
State
POstalcode
Country
Addresstype (home, business, etc.)
VolunteerPhone
volunteerid
Phone number
Phonetype
VolunteerEmail
volunteerid
EmailAddress
Project
Projectid
projectname
VolunteerHours
volunteerid
hoursworked
projectid
DateWorked
Rewards
Rewardid
Rewardtype (Continual, datelimited, etc.)
Reward
RewardBeginDate
RewardEndDate
RequiredHours
Awarded
VolunteerID
RewardID
RewardDate
You will probably have some time-limited rewards, that's why I added the date fields. You would then set up a job to calculate rewards once a week or once a month or so. Make sure to exclude those who have already receivced that particualr award if pertinent (You don't want to give a new t-shirt for every 10 hours worked do you?)
Yes, DB field entries can be files. Or, more precisely, they can be filespecs that reference files. Is that what you really meant?
While we are on the subject of data fields that reference other data, how much do you know about foreign keys? What can you accomplish with references to files that you couldn't accomplish even better by the judicious use of foreign keys?
Foreign keys, and the keys that they refer to, are fundamental concepts in the relational model of data. Without this model, your database design is going to be pretty random.
Morning,
You really must place all your tables on a chart then determine the business rules for that chart in the entity relationship diagram. Once you decide what the direct relationships are between each and every table only then would you test to see if you get the desired answers. This procedure is called database design and it appears that you didn't do that as of yet but got ahead of yourself a little bit from what I see.
There are plenty of good books on database design on the market. The one I use is "Database Design For Mere Mortals". It is very easy to read and understand.
Hope this helps.

creating a address database

I am re-creating a part of my company’s database because it does not meet future needs.
Currently we have mainly a flat file and some disjoined tables that were never fully realized.
My way of thinking is we have a table for each category except maybe the zips table, which may serve as a connect it all together table.
Please refer to image below:
Database Diagram http://www.freeimagehosting.net/uploads/248cc7e884.jpg
One thing I am thinking of is removing the zip table and just putting the zip code in the zipstocities table since the zip code is almost unique and then indexing the table on the zip code. The only downside is zip code has to be a varchar to take care of zip codes with leading zeros. Just want to know if there is a flaw in my logic.
I don't know the US ZIPcode and territorial devision system well, but I assume it's somewhat like the German one.
A state has many counties.
A county has many cities.
A city has many zip codes.
Hence I would use the following schema.
ZipCodes CityZipCodes
------------ ---------------- Cities
ZipCode (PK) <─── ZipCode (PK)(FK) -----------
City (PK)(FK) ───> CityId (PK)
Name
County (FK) ───┐
│
│
Counties │
------------- │
States CountyId (PK) <───┘
----------------- Name
StateId (PK) <─── State (FK)
Name
Abbreviation
Fixed for multiple cities per ZIP code.
One thing you should be aware of is that not all cities are in counties. In Virginia you are in either a city or county but never both.
Looking at the diagram you have, the state table is the only one of the 4 outside tables that is really necessary. Lookup tables with just an ID and a single value aren't worth the effort. These relationships are designed to make a single value in the main table (ziptocities) refer to a set of related data in the lookup table (states).
You'll need to ask yourself why you care about counties. In many states in the US, they have little importance beyond tradition and maps.
The other question will be how important will it be that the address be accurate? How many deaths will there be if important letters are not delivered in a timely manner (possibly many if the letter is about prescription drug recalls!)
You probably want to think about using data from the Postal Service, possibly using a product that corrects addresses. That way, when you get a good address, you'll be certain the mail can be delivered there - because the Postal Service will have said so!
There seem to be flaws in both your process and your logic.
I suggest that you stop thinking about tables and relationships for a moment. Instead, think about facts. Make a list of valid addresses that your database needs to support. Many surprises await you.
Don't confuse an address with a mailing label. They're not at all the same thing. Consider modeling carriers, too. In the US, whether an address is valid depends on the carrier. For example, my PO box is a valid address when the carrier is the USPS, but not when the carrier is UPS.
To save time, you might try browsing some international address formats on bitboost.
Will your logic work if two countries happen to have the same zip code? These two would be pointing to different cities in that case. here are some points to consider
Do you want to use zipcode as a kind
of primary key into address? (at
lease the city, state and country
fields). In that case, you can have
zipcode, city,state,country in one
table. Create indexes on city, state
etc.. (you have a functional
dependency of the form
zipcode->country,state,city . This
as i said may not be true across
countries.
If auto populating is
your only concern, create a
materialized view and use it.
I would recommend reading 'Data Model patterns' by David C. Hay.
But not every person who has a valid medical claim is required by law to remain in the US until the claim is settled. People move.
San Francisco is a city in California; it's not a city in Alabama. Does your design prevent nonsense entries like "San Francisco, AL"?

Resources