Are there any good references for best practices for storing postal addresses in an RDBMS? It seems there are lots of tradeoffs that can be made and lots of pros and cons to each to be evaluated -- surely this has been done time and time again? Maybe someone has at least written done some lessons learned somewhere?
Examples of the tradeoffs I am talking about are storing the zipcode as an integer vs a char field, should house number be stored as a separate field or part of address line 1, should suite/apartment/etc numbers be normalized or just stored as a chunk of text in address line 2, how do you handle zip +4 (separate fields or one big field, integer vs text)? etc.
I'm primarily concerned with U.S. addresses at this point but I imagine there are some best practices in regards to preparing yourself for the eventuality of going global as well (e.g. naming fields appropriately like region instead of state or postal code instead of zip code, etc.
For more international use, one schema to consider is the one used by Drupal Address Field. It's based on the xNAL standard, and seems to cover most international cases. A bit of digging into that module will reveal some nice pearls for interpreting and validating addresses internationally. It also has a nice set of administrative areas ( province, state, oblast, etc ) with ISO codes.
Here's the gist of the schema, copied from the module page:
country => Country (always required, 2 character ISO code)
name_line => Full name (default name entry)
first_name => First name
last_name => Last name
organisation_name => Company
administrative_area => State / Province / Region (ISO code when available)
sub_administrative_area => County / District (unused)
locality => City / Town
dependent_locality => Dependent locality (unused)
postal_code => Postal code / ZIP Code
thoroughfare => Street address
premise => Apartment, Suite, Box number, etc.
sub_premise => Sub premise (unused)
A lessons I've learned:
Don't store anything numerically.
Store country and administrative area as ISO codes where possible.
When you don't know, be lax about requiring fields. Some country may not use fields you take for granted, even basic things like locality & thoroughfare.
As an 'international' user, there is nothing more frustrating than dealing with a website that is oriented around only US-format addresses. It's a little rude at first, but becomes a serious problem when the validation is also over-zealous.
If you are concerned with going global, the only advice I have is to keep things free-form. Different countries have different conventions - in some, the house number comes before the street name, in some it comes after. Some have states, some regions, some counties, some combinations of those. Here in the UK, the zipcode is not a zipcode, it's a postcode containing both letters and numbers.
I'd advise simply ~10 lines of variable-length strings, together with a separate field for a postcode (and be careful how you describe that to cope with national sensibilities). Let the user/customer decide how to write their addresses.
If you need comprehensive information about how other countries use postal addresses, here's a very good reference link (Columbia University):
Frank's Compulsive Guide to Postal Addresses
Effective Addressing for International Mail
You should definitely consider storing house number as a character field rather than a number, because of special cases such as "half-numbers", or my current address, which is something like "129A" — but the A is not considered as an apartment number for delivery services.
I've done this (rigorously model address structures in a database), and I would never do it again. You can't imagine how crazy the exceptions are that you'll have to take into account as a rule.
I vaguely recall some issue with Norwegian postal codes (I think), which were all 4 positions, except Oslo, which had 18 or so.
I'm positively sure that from the moment we started using the geographically correct ZIP codes for all of our own national addresses, quite a few people started complaining that their mail arrived too late. Turned out those people were living near a borderline between postal areas, and despite the fact that someone really lived in postal area, say, 1600, in reality his mail should be addressed to postal area 1610, because in reality it was that neighbouring postal area that actually served him, so sending his mail to his correct postal area would take that mail a couple of days longer to arrive, because of the unwanted intervention that was required in the correct postal office to forward it to the incorrect postal area ...
(We ended up registering those people with an address abroad in the country with ISO-code 'ZZ'.)
Unless you are going to do maths on the street numbers or zip / postal codes, you are just inviting future pain by storing them as numerics.
You might save a few bytes here and there, and maybe get a faster index, but what do you when US postal, or whatever other country you are dealing with, decides the introduce alphas into the codes?
The cost of disk space is going to be a lot cheaper than the cost of fixing it later on... y2k anybody?
Adding to what #Jonathan Leffler and #Paul Fisher have said
If you ever anticipate having postal addresses for Canada or Mexico added to your requirements, storing postal-code as a string is a must. Canada has alpha-numeric postal codes and I don't remember what Mexico's look like off the top of my head.
You should certainly consult "Is this a good way to model address information in a relational database", but your question is not a direct duplicate of that.
There are surely a lot of pre-existing answers (check out the example data models at DatabaseAnswers, for example). Many of the pre-existing answers are defective under some circumstances (not picking on DB Answers at all).
One major issue to consider is the scope of the addresses. If your database must deal with international addresses, you have to be more flexible than if you only have to deal with addresses in one country.
In my view, it is often (which does not mean always) sensible to both record the 'address label image' of the address and separately analyze the content. This allows you to deal with differences between the placement of postal codes, for example, between different countries. Sure, you can write an analyzer and a formatter that handle the eccentricities of different countries (for instance, US addresses have 2 or 3 lines; by contrast, British addresses can have considerably more; one address I write to periodically has 9 lines). But it can be easier to have the humans do the analysis and formatting and let the DBMS just store the data.
Ive found that listing all possible fields from smallest discrete unit to largest is the easiest way. Users will fill in the fields they see fit. My address table looks like this:
*********************************
Field Type
*********************************
address_id (PK) int
unit string
building string
street string
city string
region string
country string
address_code string
*********************************
Where's the "trade off" in storing the ZIP as a NUMBER or VARCHAR? That's just a choice -- it's not a trade off unless there are benefits to both and you have to give up some benefits to get others.
Unless the sum of zips has any meaning at all, Zips as number is not useful.
This might be an overkill, but if you need a solution that would work with multiple countries and you need to programmatically process parts of the address:
you could have country specific address handling using two tables: One generic table with 10 VARCHAR2 columns, 10 Number columns, another table which maps these fields to prompts and has a country column tying an address structure to a country.
If you ever have to verify an address or use it to process credit card payments, you'll at least need a little structure. A free-form block of text does not work very well for that.
Zip code is a common optional field for validating payment card transactions without using the whole address. So have a separate and generously sized field for that (at least 10 chars).
Inspired by Database Answers
Line1
Line2
Line3
City
Country_Province
PostalCode
CountryId
OtherDetails
At the moment, I'm developing an international ecommerce website.
It should cover almost all addresses in this world as shown below:
*****************************************************************
Type Field name Displayed name in your form
*****************************************************************
INT id (PK)
VARCHAR(100) building Apt, office, suite, etc. (Optional)
VARCHAR(100) street Street address
VARCHAR(100) city City
VARCHAR(100) state State, province or prefecture
VARCHAR(100) zip_code Zip code
VARCHAR(100) country Country
*****************************************************************
I would just put all the fields together in a large NVARCHAR(1000) field, with a textarea element for the user to enter the value for (unless you want to perform analysis on eg. zip codes). All those address line 1, address line 2, etc. inputs are just so annoying if you have an address that doesn't fit well with that format (and, you know, there are other countries than the US).
Related
My question:
How would I implement a database design on the back-end and the front-end to accommodate a varying number of address lines such as AddressLine1, AddressLine2, AddressLine3, etc. into infinity while maintaining an intuitive front-end user experience. I want this to maximize the cleanliness and ease of merging documents later on once the database has been developed. Some addresses have only one street line while others can even have five or maybe more.
Background:
I am very new to data modeling and database design. I don't yet understand the consequences that database modeling will have on how the forms on the front-end will have to be designed and the headaches that may go along with a particular design. Therefore, I'm not sure if what I'm seeking is a big mistake.
I'm designing a case management database for a law firm. We plan to create a separate Addresses table and have a many-to-many relationship between the people/entities and the addresses--i.e., many people/entities may have many addresses and the same address may belong to many people/entities.
Thank you!
Typically, for an address, the data is not normalized over the lines. So, an address table would just have fields like AddressLine1 and AddressLine2.
The bigger geography information (example: city, state, country, postal code) would be stored in separate fields in the address record.
The reason for this is quite practical. Addresses are typically printed, and there is a limited amount of printing space available. If there are four lines, for instance, you have the name, address line 1, address line 2, and city/state/country/postal code.
If you really needed to store an unlimited number of lines, you would do it with an AddressLines table. The AddressLines table would have fields, such as:
AddressId -- the address record it belongs to
LineNumber
LineContents
However, this seems like overkill.
Your bigger problem is standardizing addresses. Have you given that any thought? (You know: "101 6th Avenue", "101 Sixth Ave.", and "101 Avenue of the Americas" are all the same address in New York City.)
I am trying to find a better approach for storing people's name in the table.
What is the benefits of 3 field over 1 field for storing persons name?
UPDATE
Here is an interesting discussion and resources about storing names and user experience
Merging firstname/last name into one field
You can always construct a full name from its components, but you can't always deconstruct a full name into its components.
Say you want to write an email starting with "Dear Richie" - you can do that trivially if you have a given_name field, but figuring out what someone's given name is from their full name isn't trivial.
You can also trivially search or sort by given_name, or family_name, or whatever.
(Note I'm using given_name, family_name, etc. rather than first_name, last_name, because different cultures put their names in different orders.)
Solving this problem in the general case is hard - here's an article that gives a flavour of how hard it is: Representing People's Names in Dublin Core.
Keep your data as clean as you can!
How?
Ask your user only as few things as you absolutely need at the time you ask.
How you store the name does not matter. What does matter is that
the user experience is as good as can be
you don't have false data in your system
If you annoy the users with mandatory fields to fill in and re-question them several times, they can get upset and not buy into your application right there and then. You want to avoid bad user experiences at all times.
No user cares how easy it is for you to search your database for his middle name. He wants to have a easy, feel good experience, that's it.
What do users do if they are forced to input data like their postal address, or even email address when they only want a "read-only" account with no notifications needed? They put garbage data into your system. This will render your super search and sort algorithms useless anyway.
Thus, my advice would be in any app to gather just as little information from your user as you really need in order to serve them, no more.
If for example you run a online shop for pet food, don't ask your users at sign-up what kind of pets they own. Make it an option for them to fill in once they are logged in and all happy (new customers). Don't ask them their postal address until they order stuff that is actually carried to their house, stuff they pay for and thus care that YOU have their exact coordinates.
This will lead to a lot better data quality and this is what you should care about, not technical details the user has no benefit from....
In your example I would just ask for the full name (not sure though) and once the user willingly subscribes to your newsletter, let the user decide how he/she wants to be addressed...
As others have said, how do you decompose a full name in to its component parts.
Colin Angus Mackay
Jean Michel Jarre
Vincent van Gogh
Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso
How do you reliably decompose that lot?
To learn more, see falsehoods programmers believe about names.
I was looking up the Spanish Civil War the other day, and found this exception to most rules:
Francisco Paulino Hermenegildo Teódulo Franco y Bahamonde, Salgado y Pardo de Andrade
Father: Nicolás Franco y Salgado-Araújo
Mother: María del Pilar Bahamonde y Pardo de Andrade
Next time I'm working on a system that has to store names, I'm going to try something radical: designing from the requirements.
What are we going to use the names for?
Name on an address label for the postal service
Greeting on the website
Informal name
Based on what the names will be used for, we'd determine how much information to store. Maybe we allow the user to enter all three of those, including line breaks in the first case (Generalissimo Franco might want his full titles and appointments listed, if he weren't still dead). Maybe we provide First, Middle, Last, Generation as an option, and fill in the rest as defaults. Maybe we offer other common options like Surname, Given Name.
This is in contrast to the old-style First, Middle, Last we've used since before I started programming in COBOL back in 1975, and have "made fit" ever since.
Unfortunately this is kind of like asking what is the best way to store a number in the database. It depends on what you are going to do with it - sometimes you want an int,other times a byte, and sometimes a float. With names it depends on things like what cultures do you expect your users to come from, what you plan on doing with the names (will you be using these names to connect with another system that stores names as "last name, first name"?), and how much you can afford to annoy your users. If this is an internal HR application, you can probably afford to annoy the users a lot, and have a very structured, formal breakdown of name components (there are way more than 3 - don't forget mr/mrs, jr, III, multiple middle names, hyphenated last names, and who knows what else if you are trying to handle names from all cultures). If you have a webapp that users might or might not care about, you can't ask them to care too much.
You may want to search on the 3 separate fields for one and its inexpensive to concatenate for the fullname.
e.g. If you want to search for all the Mr. Nolans your query would be
SELECT Title+' '+FirstName+' '+Surname As FullName
from table where firstname = 'Mr' and surname ='Nolan'
to do this with just the fullnames would be a pain.
I'm English and only have one name. I normally put it in the 'surname' field for least aggravation. I am usually forced to put something in the 'first name' field too, which by definition is wrong.
Any attempt to impose anything more than 'Name' is doomed to be wrong at least some of the time, and sometimes be very frustrating to users. Single names are common in Southern India, Indonesia, and Pakistan (which is hundreds of millions of people) as well as the occaisional weirdo on the UK like me.
The 'first, middle, last' thing is very U.S.-centric. Few other countries think of names that way. Please stop doing it.
Keeping the fields separate allows you to support different output formats and cultures where the family name is written first
Things like ORDER BY firstname or ORDER BY lastname are possible when you break the name up into multiple fields.
Not as easy to do when you mash all names into one field.
About the only thing I can think of is for searching purposes. It's a bit better to search a field using [=] rather than say [like].
If you have no need to display the name as seperate words then go with a single field.
But if you need to do something like [Dear Mr. Achu] then perhaps a 3 field approach would be better.
Most of the time it's there to support writing form letters like, "Mr. so-and-so", or to search/sort by last name which is very common.
Given that first/middle/last may not apply to all cultures, there could be a better approach. It might be better expressed as "informal name" / "formal name" / "legal name" or something like that.
Still, at this point first/middle/last is very common, and from a data entry standpoint it is what everyone expect.
Here's the thing, not even humans can get this right all the time, there's just too much data, and too many special cases. I could change my name right now to be 20 parts, with the middle 13 as my "first" name. Parts of names can contain any number of words, and there can be any number of parts of names. Some people only have 1 name (no surname). Some people have lots of middle names. Some people have first or surnames composed of several words. Some people list their surname first. Some people go by their middle name. Some people go by nicknames that aren't obviously related to their given name.
If you try to guess these conventions in software YOU WILL FAIL. Period. Maybe you'll get it right some of the time, maybe even most of the time, but is even that worth it? In my opinion you should store names as one field and stop trying to be cute by using first names to refer to a person. If you need additional information about a name (e.g. a nickname), ask the user!
Each of the individual names is an atomic piece of data. When they are stored separately then it is easier to print them out in different formats such as Firstname Lastname and Lastname, Firstname.
There is no benefit if you never need to sort or search by first, middle, or last name.
Flexibility.
e.g.
If someone had a double barreled last name and no middle name.
I voted up some of these answers, but if you are looking to avoid repetitive or redundant or messy concatenation in your code, you can always use a computed column in the database or a method in a class which exposes the name consistently reconstructed. If these concatenations are expensive (because you are printing a million statements), you can use a persisted column.
Often you will allow users to specify names like nicknames or friendly names, so that you aren't referring to them by the name in their records or always as Mr. Smith.
It all depends on your requirements. There is no single good answer without the environment it is expected to satisfy.
Not sure how practical it would be, but maybe if cultural sensitivity is important in the context of the application being developed, perhaps a name should be a collection with each element of the collection carrying a value indicating if the name is the addressable "first name" or the addressable "surname" and so on for "title" or anything else that needs to be identified. A name ID could be used to identify the order of the elements for re-composing the full name.
Just have two fields, 'Full Name', and 'Preferred Name' - easy. Supports every name in existance (As long as the language has lexical symbols... So, yes, that excludes languages that do not have a written form).
Just make sure that they are handled in some unicode format, and that application code properly handles unicode conversion.
To me it is simply better to store 3 names so that explicit parsing is necessary later on if the individual components are needed..
You can't always separate surname from full name cleanly and reliably so there's good reason to separate that because you often need surname. After you do that, there are two common approaches:
first_name and middle_name; or
given_names.
(2) is arguably more preferable because people sometimes have more than tow given names and (1) is more inflexible in this regard.
Also, another common field is preferred_name (in addition to the above).
The i18n issue can be a bugger either way. certain cultures use the surname first and the given name last, that strikes the idea of first and last names so we move to fields for surnames and given names. Wait, some cultures don't have a surname or the surname is modified by the gender of the named.
We can get into tribal cultures where the person is renamed on adulthood. "Sitting Bull" childhood name was "Jumping Badger".
This is somewhat of a ramble but what I am showing is that the more fields you have the more accurate the design is. There should be at least a not null 'given name' field and a optional 'surname' field tied to a PK that is an integer. If the aforementioned requirements are observed, fields can be added without issues of breaking queries.
Some of the issues can be solved by also storing an additional column like PreferredName. We do that in our DB and also store prefix column and a suffix column.
e.g
'Prof Henry W Jones Jnr' with preferred name as 'Indiana Jones'.
Given that state information is implicit in the zip code aren't storing both of them some violaiton of third normal form? Can or should you simply combine them into one field?
According to this post, there are a few zip codes that cross state boundaries. So no, it is not a violation of 3NF.
Actually, there are a few rare cases where a ZIP Code crosses state boundaries. Usually it is due to access problems, such as being on a military base or due to constraints of the transporation network.
One such case is Protem, Missouri (ZIP Code 65733). Some of the Arkansas roads north of Bull Shoals Lake can best be accessed by the Protem delivery unit rather than an Arkansas post office. Some examples of such roads include Ann Street, Kalijah Road, McBride Road, Red Oak Lane, and Vance Road on Highway Carrier Route H002 in ZIP Code 65733. McBride Road actually crosses across the state boundary. If you look at the road network in an online mapping program, you can see that a rural carrier from say, nearby Diamond City, AR (ZIP code 72644), on the south side of Bull Shoals Lake, would need to drive several miles to be able to access the roads listed above.
For another example, Fort Campbell, Kentucky (ZIP Code 42223) also has some roads that exist within Tennessee.
That statement isn't actually true in all geographical areas. Australia has a few sister cities that straddle state boundaries yet share the same postcode.
And 3NF, while incredibly useful, is not inviolable. I've sometimes reverted some table information back to 2NF for performance reasons.
Nope. There are some zip codes that cross state lines. See Wikipedia for some examples. Furthermore, normalization reduces redundancy, while addresses are actually fairly complicated things that are easy to get one component of wrong. Redundancy means that even if part of the address is wrong, there is a good chance that the mail will be able to get where its going.
I recall a time when a hiker from Europe stayed at my fraternity, and wanted to send a thank-you note. He did not understand American addresses or geography very well, so when he sent the note it was addressed to "<fraternity name> <not quite correct name of university> New England? USA". The mail actually got there, amazingly enough.
Redundancy in addresses can be a very good thing, and you generally shouldn't assume more about an address than you need to. For instance, some people don't have a street number; you put "general delivery", and the mailman is expect to know where the letter goes (or you can pick it up at the post office if he doesn't).
There is a different issue. You might want to make a difference between the data that was entered (which could be conflicting) and the conclusion you make from that.
3NF violation by example
Let's look at the below denormalized table for a blog posts project. It's not the 3rd normal form, it's broken. Let's say there are multiple
posts with same author, we may update a few rows and leave others un-updated. Leaving the table data inconsistent.
Hence this violates normalization because it violates a common way to describing normalized tables in 3rd normal form, which is that every non-key attribute in the table must provide a fact about the key, the whole key and nothing but the key. And that's of a play on words for what you say in a US courtroom, telling the truth, the whole truth and nothing but the truth. The key in this case, is the Post Id and there is a non-key attribute Author Email which does not follow that. Because it does, in fact tell something about the author. And so it violates that 3rd normal form by not achieving the goals of normalization
hope this helps.
I expect the column to be a VARCHAR2, in my Oracle Database.
US Zips are 9.
Canadian is 7.
I am thinking 32 characters would be reasonable upper limit
What am I missing?
[EDIT]
TIL: 12 is a reasonable answer to the question
Thanks to everyone who contributed.
Skimming through Wikipedia's Postal Codes page, 32 characters should be more than enough. I would say even 16 characters is good.
As already raised by #neil-mcguigan, wikipedia has a decent page on the topic. Based on that 12 characters should do it: http://en.wikipedia.org/wiki/List_of_postal_codes
The wikipedia article lists ~254 countries, which is pretty good regarding UPU (Universal Postal Union) has 192 member countries.
Why would you declare a field size larger than the actual data you are expecting to store in it?
If the initial version of your application is going to support US and Canadian addresses (which I'm inferring from the fact that you call out those sizes in your question), I'd declare the field as VARCHAR2(9) (or VARCHAR2(10) if you intend to store the hyphen in ZIP+4 fields). Even looking at the posts others have made to postal codes across countries, VARCHAR2(9) or VARCHAR2(10) would be sufficient for the most if not all other countries.
Down the line, you can always ALTER the column to increase the length should the need arise. But it is generally hard to prevent someone, somewhere from deciding to get "creative" and stuff 50 characters into a VARCHAR2(50) field for one reason or another (i.e. because they want another line on a shipping label). You also have to deal with testing the boundary cases (will every application that displays a ZIP handle 50 characters?). And with the fact that when clients are retrieving data from the database, they are generally allocating memory based on the maximum size of the data that will be fetched, not the actual length of a given row. Probably not a huge deal in this specific case, but 40 bytes per row could be a decent chunk of RAM for some situations.
As an aside, you might also consider storing (at least for US addresses) the ZIP code and the +4 extension separately. It is generally useful to be able to generate reports by geographical region, and you may frequently want to put everything in a ZIP code together rather than breaking it down by the +4 extension. At that point, it's useful to not have to try to SUBSTR out the first 5 characters for the ZIP code.
Normalization? Postal codes might be used more than once, and might be related to street names or town names. Separate table(s).
What you're missing is a reason why you need the postal code to be handled specially.
If you don't really need to WORK with a postal code, I would suggest not worrying about it. By work, I mean do special processing for rather than just use to print address labels and so on.
Simply create three or four address fields of VARCHAR2(50) [for example] and let the user input whatever they want.
Do you really need to group your orders or transactions by postcode? I think not, since different countries have vastly different schemes for this field.
Canadian Postal Codes are only 6 characters, in the form of letter's and numbers (LNLNLN)
UK have published standards: UK Government Data Standards Catalogue
Max 35 characters per line
International Postal Address:
Minimum of 2 lines and maximum of 5 lines for the postal delivery point
details, plus 1 line for country and 1 line for postcode/zip code
The UK postal code length is:
Minimum 6 and Maximum 8 characters
If you want to integrate postal codes in database then geonames database is best to use. Even though it is tough to use and understand but it is the largest geographical database available freely to users like us.
All the other such database are more or less likely have same data and structure. They just remove some extra/redundant information from database. If you are just doing it for low load systems use their free services the limits are attractive and provides more easy interface using json and ajax. You can view the limits here
For your information varchar(20) is sufficient for storing postal codes
What is the "best" way to store international addresses in a database? Answer in the form of a schema and an explanation of the reasons why you chose to normalize (or not) the way you did. Also explain why you chose the type and length of each field.
Note: You decide what fields you think are necessary.
Plain freeform text.
Validating all the world's post/zip codes is too hard; a fixed list of countries is too politically sensitive; mandatory state/region/other administrative subdivision is just plain inappropriate (all too often I'm asked which county I live in--when I don't, because Greater London is not a county at all).
More to the point, it's simply unnecessary. Your application is highly unlikely to be modelling addresses in any serious way. If you want a postal address, ask for the postal address. Most people aren't so stupid as to put in something other than a postal address, and if they do, they can kiss their newly purchased item bye-bye.
The exception to this is if you're doing something that's naturally constrained to one country anyway. In this situation, you should ask for, say, the { postcode, house number } pair, which is enough to identify a postal address. I imagine you could achieve similar things with the extended zip code in the US.
In the past I've modeled forms that needed to be international after the ups/fedex shipping address forms on their websites (I figured if they don't know how to handle an international order we are all hosed). The fields they use can be used as reference for setting up your schema.
In general, you need to understand why you want an address. Is it for shipping/mailing? Then there is really only one requirement, have the country separate. The other lines are freeform, to be filled in by the user. The reason for this is the common forwarding strategy for mail : any incoming mail for a foreign country is forwarded without looking at the other address lines. Hence, the detailed information is parsed only by the mail sorter located in the country itself. Like the receiver, they'll be familiar with national conventions.
(UPS may bunch together some small European countries, e.. all the Low Countries are probably served from Belgium - the idea still holds.)
I think adding country/city and address text will be fine. country and city should be separate for reporting. Managers always ask for these kind of reports which you do not expect and I dont prefer running a LIKE query through a large database.
Not to give Facebook undue respect. However, the overall structure of the database seems to be overlooked in many web applications launching every day. Obviously I don't think there is a perfect solution that covers all the potential variables with address structure without some hard work. That said, combined with autocomplete Facebook manages to take location input data and eliminate a majority of their redundant entries. They do this by organizing their database well enough to provide autocomplete information in a low cost, low error way to the client in real time allowing them to more or less choose the correct location from an existing list.
I think the best solution is to access a third party database which contains your desired geographic scope and use it to initially seed your user location information. This will allow you to avoid doing the groudwork of creating your own. With any luck you can reduce the load on your server by allowing your new users to receive the correct autocomplete information directly off your third party supplier. Eventually you will be able to fill most autocomplete for location information such as city, country, etc. from information contained in your own database from user input data.
You need to provide a bit more details about how you are planning to use the data. For example, fields like City, State, Country can either be text in the single table, or be codes which are linked to a separate table with a Foreign Key.
Simplest would be
Address_Line_01 (Required, Non blank)
Address_Line_02
Address_Line_03
Landmark
City (Required)
Pin (Required)
Province_District
State (Required)
Country (Required)
All the above can be Text/Unicode with appropriate field lengths.
Phone Numbers as applicable.