Database Design, Google Maps, and Address Fields

Database Design, Google Maps, and Address Fields - database

I want to collect the addresses of my users so that I can plot them on a Google Map. I know I need to store the lat/long values of their address, which I can get from Google Map API.
I'm looking for recommendations on how to divide the various address parts and save them to the database. I commonly see things like this:
Address Line 1
Address Line 2
City
State/Region/Province
ZIP/Postal Code
Country
Google breaks down these address components differently, though. See, for example: http://maps.googleapis.com/maps/api/geocode/json?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false
I'm not sure what parts of Google address components equate to what is commonly seen in web forms (e.g. is administrative_area_level_1 always the state/region/province?). I'd like to store the various address components as atomically as possible so that I have the greatest control when displaying the address information later on.
NOTE: I also plan to store the formatted_address as I think that could be useful in some cases.
So, what should I store in my database?

This section of the Geocoding documentation provides a pretty good description of the types of data you get back from the reverse geocoder. These data types were developed by Google to describe any address in the world, so are probably a good starting point.
Based on the following quote the administrative_area_level_1 describes subnational jurisdictions, states in the US/Australia, prefectures in Japan, provinces in france etc:
administrative_area_level_1 indicates a first-order civil entity
below the country level. Within the
United States, these administrative
levels are states. Not all nations
exhibit these administrative levels.
You will probably need to be careful about the assumptions you make about these datatypes for other countries. For instance, the administrative_area_level_1 for addresses in London is England. But with a good understanding of this schema, you should be able to render locale friendly addresses anywhere in the world.

Related

Can there be a postal code area which falls on to multiple time zones?

Can there be a city/place which has some postal code 'XXXX' and it falls in to 2 or more time zones?
I'm not sure if this question is even valid or not. But because of the overwhelming information regarding locations and time zones in the Internet I got a bit confused.
(My requirement is actually to store user's country, city, postal code and time zone information in a database)

Yes. In theory, postal code boundaries and time zone boundaries are separate entities, and thus their can be oddities like one postal code with multiple time zones. It is possible.
However, it is hard to find real-world examples. Cities that straddle official boundaries of state, county or country borders tend to have postal codes split along such boundaries, and may use a single time zone by convention - even if it's not the "official" one. (Though, that doesn't mean there aren't any.)
Additionally, consider that many postal codes are not directly related to physical location, but rather are used for routing mail in a particular way. For example, several postal codes in the US are used for sending mail to members of various branches of the military - even if they happen to be stationed overseas. Therefore, not every postal code can be mapped to a time zone.
A better approach is to use the information you have to approximate a latitude and longitude. This process is called "geocoding" and there are many online services that can perform this (including Google Maps). Then, use one of the techniques listed here (including Google Maps) to obtain a time zone ID for that location.

Is there a public geolocation system that I can query for names of local cities?

I'm looking to attempt to simplify address entry into a system where the city textbox has autosuggest initially populated by the user's geolocation. In the past it has seemed that autosuggesting the city name is prohibitively costly without knowing the province/state/country first but it doesn't make sense to require the user to enter the address backwards as we don't think about address information this way. On the other hand, not autosuggesting the city name means we end up with all sorts of weird and wonderful entries for mis-spelled cities from around the world.
I was wondering if there's a service that I can query that would automatically respond with the most appropriate city names according to not only what the user enters in the textbox, but the location of that user based on the country and political boundary they fall within?
For instance, if I am in Canada [as I am] and I enter 'Mi' then I'd be presented with all cities within Canada starting with 'Mi' until it was determined that the information I was entering wasn't Canadian at which point, it would use the next most likely configured country based on our usage pattern - i.e. it would check the U.S. next, followed by Mexico and then other less likely destinations. I can write all this myself if I had the database but I don't know where I can find one and my suspicion is that it would be less scalable than querying a pre-existing service on the web.

Looks as though MaxMind offers a free database that you could download in CSV:
There's an online demo to test it a bit if you'd like, but no way to query it through a web service.
IPInfoDB also has their database available for download - they have an XML API, but it only supports looking up the city/country for a particular IP. You're trying to do something a little more wide than that, looking for every city in a particular country, with country selected based on IP. I wouldn't expect that there's a web service for that, it's a pretty specific requirement.
Edited to add: You could use the IPInfoDB API to look up the country though, and then generate the autocomplete suggestions from a local country/city database. That way all the IP-geolocation wouldn't need to be done locally. There are various places that you can get a list of cities in a particular country. For example, here's some comprehensive lists maintained by the National Geospatial-Intelligence Agency

Best practices for storing postal addresses in a database (RDBMS)?

Are there any good references for best practices for storing postal addresses in an RDBMS? It seems there are lots of tradeoffs that can be made and lots of pros and cons to each to be evaluated -- surely this has been done time and time again? Maybe someone has at least written done some lessons learned somewhere?
Examples of the tradeoffs I am talking about are storing the zipcode as an integer vs a char field, should house number be stored as a separate field or part of address line 1, should suite/apartment/etc numbers be normalized or just stored as a chunk of text in address line 2, how do you handle zip +4 (separate fields or one big field, integer vs text)? etc.
I'm primarily concerned with U.S. addresses at this point but I imagine there are some best practices in regards to preparing yourself for the eventuality of going global as well (e.g. naming fields appropriately like region instead of state or postal code instead of zip code, etc.

For more international use, one schema to consider is the one used by Drupal Address Field. It's based on the xNAL standard, and seems to cover most international cases. A bit of digging into that module will reveal some nice pearls for interpreting and validating addresses internationally. It also has a nice set of administrative areas ( province, state, oblast, etc ) with ISO codes.
Here's the gist of the schema, copied from the module page:
country => Country (always required, 2 character ISO code)
name_line => Full name (default name entry)
first_name => First name
last_name => Last name
organisation_name => Company
administrative_area => State / Province / Region (ISO code when available)
sub_administrative_area => County / District (unused)
locality => City / Town
dependent_locality => Dependent locality (unused)
postal_code => Postal code / ZIP Code
thoroughfare => Street address
premise => Apartment, Suite, Box number, etc.
sub_premise => Sub premise (unused)
A lessons I've learned:
Don't store anything numerically.
Store country and administrative area as ISO codes where possible.
When you don't know, be lax about requiring fields. Some country may not use fields you take for granted, even basic things like locality & thoroughfare.

As an 'international' user, there is nothing more frustrating than dealing with a website that is oriented around only US-format addresses. It's a little rude at first, but becomes a serious problem when the validation is also over-zealous.
If you are concerned with going global, the only advice I have is to keep things free-form. Different countries have different conventions - in some, the house number comes before the street name, in some it comes after. Some have states, some regions, some counties, some combinations of those. Here in the UK, the zipcode is not a zipcode, it's a postcode containing both letters and numbers.
I'd advise simply ~10 lines of variable-length strings, together with a separate field for a postcode (and be careful how you describe that to cope with national sensibilities). Let the user/customer decide how to write their addresses.

If you need comprehensive information about how other countries use postal addresses, here's a very good reference link (Columbia University):
Frank's Compulsive Guide to Postal Addresses
Effective Addressing for International Mail

You should definitely consider storing house number as a character field rather than a number, because of special cases such as "half-numbers", or my current address, which is something like "129A" — but the A is not considered as an apartment number for delivery services.

I've done this (rigorously model address structures in a database), and I would never do it again. You can't imagine how crazy the exceptions are that you'll have to take into account as a rule.
I vaguely recall some issue with Norwegian postal codes (I think), which were all 4 positions, except Oslo, which had 18 or so.
I'm positively sure that from the moment we started using the geographically correct ZIP codes for all of our own national addresses, quite a few people started complaining that their mail arrived too late. Turned out those people were living near a borderline between postal areas, and despite the fact that someone really lived in postal area, say, 1600, in reality his mail should be addressed to postal area 1610, because in reality it was that neighbouring postal area that actually served him, so sending his mail to his correct postal area would take that mail a couple of days longer to arrive, because of the unwanted intervention that was required in the correct postal office to forward it to the incorrect postal area ...
(We ended up registering those people with an address abroad in the country with ISO-code 'ZZ'.)

Unless you are going to do maths on the street numbers or zip / postal codes, you are just inviting future pain by storing them as numerics.
You might save a few bytes here and there, and maybe get a faster index, but what do you when US postal, or whatever other country you are dealing with, decides the introduce alphas into the codes?
The cost of disk space is going to be a lot cheaper than the cost of fixing it later on... y2k anybody?

Adding to what #Jonathan Leffler and #Paul Fisher have said
If you ever anticipate having postal addresses for Canada or Mexico added to your requirements, storing postal-code as a string is a must. Canada has alpha-numeric postal codes and I don't remember what Mexico's look like off the top of my head.

You should certainly consult "Is this a good way to model address information in a relational database", but your question is not a direct duplicate of that.
There are surely a lot of pre-existing answers (check out the example data models at DatabaseAnswers, for example). Many of the pre-existing answers are defective under some circumstances (not picking on DB Answers at all).
One major issue to consider is the scope of the addresses. If your database must deal with international addresses, you have to be more flexible than if you only have to deal with addresses in one country.
In my view, it is often (which does not mean always) sensible to both record the 'address label image' of the address and separately analyze the content. This allows you to deal with differences between the placement of postal codes, for example, between different countries. Sure, you can write an analyzer and a formatter that handle the eccentricities of different countries (for instance, US addresses have 2 or 3 lines; by contrast, British addresses can have considerably more; one address I write to periodically has 9 lines). But it can be easier to have the humans do the analysis and formatting and let the DBMS just store the data.

Ive found that listing all possible fields from smallest discrete unit to largest is the easiest way. Users will fill in the fields they see fit. My address table looks like this:
*********************************
Field Type
*********************************
address_id (PK) int
unit string
building string
street string
city string
region string
country string
address_code string
*********************************

Where's the "trade off" in storing the ZIP as a NUMBER or VARCHAR? That's just a choice -- it's not a trade off unless there are benefits to both and you have to give up some benefits to get others.
Unless the sum of zips has any meaning at all, Zips as number is not useful.

This might be an overkill, but if you need a solution that would work with multiple countries and you need to programmatically process parts of the address:
you could have country specific address handling using two tables: One generic table with 10 VARCHAR2 columns, 10 Number columns, another table which maps these fields to prompts and has a country column tying an address structure to a country.

If you ever have to verify an address or use it to process credit card payments, you'll at least need a little structure. A free-form block of text does not work very well for that.
Zip code is a common optional field for validating payment card transactions without using the whole address. So have a separate and generously sized field for that (at least 10 chars).

Inspired by Database Answers
Line1
Line2
Line3
City
Country_Province
PostalCode
CountryId
OtherDetails

At the moment, I'm developing an international ecommerce website.
It should cover almost all addresses in this world as shown below:
*****************************************************************
Type Field name Displayed name in your form
*****************************************************************
INT id (PK)
VARCHAR(100) building Apt, office, suite, etc. (Optional)
VARCHAR(100) street Street address
VARCHAR(100) city City
VARCHAR(100) state State, province or prefecture
VARCHAR(100) zip_code Zip code
VARCHAR(100) country Country
*****************************************************************

I would just put all the fields together in a large NVARCHAR(1000) field, with a textarea element for the user to enter the value for (unless you want to perform analysis on eg. zip codes). All those address line 1, address line 2, etc. inputs are just so annoying if you have an address that doesn't fit well with that format (and, you know, there are other countries than the US).

Best practices for consistent and comprehensive address storage in a database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Are there any best practices (or even standards) to store addresses in a consistent and comprehensive way in a database ?
To be more specific, I believe at this stage that there are two cases for address storage :
you just need to associate an address to a person, a building or any item (the most common case). Then a flat table with text columns (address1, address2, zip, city) is probably enough. This is not the case I'm interested in.
you want to run statistics on your addresses : how many items in a specific street, or city or... Then you want to avoid misspellings of any sorts, and ensure consistency. My question is about best practices in this specific case : what are the best ways to model a consistent address database ?
A country specific design/solution would be an excellent start.
ANSWER : There does not seem to exist a perfect answer to this question yet, but :
xAL, as suggested by Hank, is the closest thing to a global standard that popped up. It seems to be quite an overkill though, and I am not sure many people would want to implement it in their database...
To start one's own design (for a specific country), Dave's link to the Universal Postal Union (UPU) site is a very good starting point.
As for France, there is a norm (non official, but de facto standard) for addresses, which bears the lovely name of AFNOR XP Z10-011 (french only), and has to be paid for. The UPU description for France is based on this norm.
I happened to find the equivalent norm for Sweden : SS 613401.
At European level, some effort has been made, resulting in the norm EN 14142-1. It is obtainable via CEN national members.

I've been thinking about this myself as well. Here are my loose thoughts so far, and I'm wondering what other people think.
xAL (and its sister that includes personal names, XNAL) is used by both Google and Yahoo's geocoding services, giving it some weight. But since the same address can be described in xAL in many different ways--some more specific than others--then I don't see how xAL itself is an acceptable format for data storage. Some of its field names could be used, however, but in reality the only basic format that can be used among the 16 countries that my company ships to is the following:
enum address-fields
{
name,
company-name,
street-lines[], // up to 4 free-type street lines
county/sublocality,
city/town/district,
state/province/region/territory,
postal-code,
country
}
That's easy enough to map into a single database table, just allowing for NULLs on most of the columns. And it seems that this is how Amazon and a lot of organizations actually store address data. So the question that remains is how should I model this in an object model that is easily used by programmers and by any GUI code. Do we have a base Address type with subclasses for each type of address, such as AmericanAddress, CanadianAddress, GermanAddress, and so forth? Each of these address types would know how to format themselves and optionally would know a little bit about the validation of the fields.
They could also return some type of metadata about each of the fields, such as the following pseudocode data structure:
structure address-field-metadata
{
field-number, // corresponds to the enumeration above
field-index, // the order in which the field is usually displayed
field-name, // a "localized" name; US == "State", CA == "Province", etc
is-applicable, // whether or not the field is even looked at / valid
is-required, // whether or not the field is required
validation-regex, // an optional regex to apply against the field
allowed-values[] // an optional array of specific values the field can be set to
}
In fact, instead of having individual address objects for each country, we could take the slightly less object-oriented approach of having an Address object that eschews .NET properties and uses an AddressStrategy to determine formatting and validation rules:
object address
{
set-field(field-number, field-value),
address-strategy
}
object address-strategy
{
validate-field(field-number, field-value),
cleanse-address(address),
format-address(address, formatting-options)
}
When setting a field, that Address object would invoke the appropriate method on its internal AddressStrategy object.
The reason for using a SetField() method approach rather than properties with getters and setters is so that it is easier for code to actually set these fields in a generic way without resorting to reflection or switch statements.
You can imagine the process going something like this:
GUI code calls a factory method or some such to create an address based on a country. (The country dropdown, then, is the first thing that the customer selects, or has a good guess pre-selected for them based on culture info or IP address.)
GUI calls address.GetMetadata() or a similar method and receives a list of the AddressFieldMetadata structures as described above. It can use this metadata to determine what fields to display (ignoring those with is-applicable set to false), what to label those fields (using the field-name member), display those fields in a particular order, and perform cursory, presentation-level validation on that data (using the is-required, validation-regex, and allowed-values members).
GUI calls the address.SetField() method using the field-number (which corresponds to the enumeration above) and its given values. The Address object or its strategy can then perform some advanced address validation on those fields, invoke address cleaners, etc.
There could be slight variations on the above if we want to make the Address object itself behave like an immutable object once it is created. (Which I will probably try to do, since the Address object is really more like a data structure, and probably will never have any true behavior associated with itself.)
Does any of this make sense? Am I straying too far off of the OOP path? To me, this represents a pretty sensible compromise between being so abstract that implementation is nigh-impossible (xAL) versus being strictly US-biased.
Update 2 years later: I eventually ended up with a system similar to this and wrote about it at my defunct blog.
I feel like this solution is the right balance between legacy data and relational data storage, at least for the e-commerce world.

I'd use an Address table, as you've suggested, and I'd base it on the data tracked by xAL.

In the UK there is a product called PAF from Royal Mail
This gives you a unique key per address - there are hoops to jump through, though.

I basically see 2 choices if you want consistency:
Data cleansing
Basic data table look ups
Ad 1. I work with the SAS System, and SAS Institute offers a tool for data cleansing - this basically performs some checks and validations on your data, and suggests that "Abram Lincoln Road" and "Abraham Lincoln Road" be merged into the same street. I also think it draws on national data bases containing city-postal code matches and so on.
Ad 2. You build up a multiple choice list (ie basic data), and people adding new entries pick from existing entries in your basic data. In your fact table, you store keys to street names instead of the street names themselves. If you detect a spelling error, you just correct it in your basic data, and all instances are corrected with it, through the key relation.
Note that these options don't rule out each other, you can use both approaches at the same time.

In the US, I'd suggest choosing a National Change of Address vendor and model the DB after what they return.

The authorities on how addresses are constructed are generally the postal services, so for a start I would examine the data elements used by the postal services for the major markets you operate in.
See the website of the Universal Postal Union for very specific and detailed information on international postal address formats:http://www.upu.int/post_code/en/postal_addressing_systems_member_countries.shtml

"xAl is the closest thing to a global standard that popped up. It seems to be quite an overkill though, and I am not sure many people would want to implement it in their database..."
This is not a relevant argument. Implementing addresses is not a trivial task if the system needs to be "comprehensive and consistent" (i.e. worldwide). Implementing such a standard is indeed time consuming, but to meet the specified requirement nevertheless mandatory.

normalize your database schema and you'll have the perfect structure for correct consistency. and this is why:
http://weblogs.sqlteam.com/mladenp/archive/2008/09/17/Normalization-for-databases-is-like-Dependency-Injection-for-code.aspx

I asked something quite similar earlier: Dynamic contact information data/design pattern: Is this in any way feasible?.
The short answer: Storing adderres or any kind of contact information in a database is complex. The Extendible Address Language (xAL) link above has some interesting information that is the closest to a standard/best practice that I've come accross...

What is the "best" way to store international addresses in a database?

What is the "best" way to store international addresses in a database? Answer in the form of a schema and an explanation of the reasons why you chose to normalize (or not) the way you did. Also explain why you chose the type and length of each field.
Note: You decide what fields you think are necessary.

Plain freeform text.
Validating all the world's post/zip codes is too hard; a fixed list of countries is too politically sensitive; mandatory state/region/other administrative subdivision is just plain inappropriate (all too often I'm asked which county I live in--when I don't, because Greater London is not a county at all).
More to the point, it's simply unnecessary. Your application is highly unlikely to be modelling addresses in any serious way. If you want a postal address, ask for the postal address. Most people aren't so stupid as to put in something other than a postal address, and if they do, they can kiss their newly purchased item bye-bye.
The exception to this is if you're doing something that's naturally constrained to one country anyway. In this situation, you should ask for, say, the { postcode, house number } pair, which is enough to identify a postal address. I imagine you could achieve similar things with the extended zip code in the US.

In the past I've modeled forms that needed to be international after the ups/fedex shipping address forms on their websites (I figured if they don't know how to handle an international order we are all hosed). The fields they use can be used as reference for setting up your schema.

In general, you need to understand why you want an address. Is it for shipping/mailing? Then there is really only one requirement, have the country separate. The other lines are freeform, to be filled in by the user. The reason for this is the common forwarding strategy for mail : any incoming mail for a foreign country is forwarded without looking at the other address lines. Hence, the detailed information is parsed only by the mail sorter located in the country itself. Like the receiver, they'll be familiar with national conventions.
(UPS may bunch together some small European countries, e.. all the Low Countries are probably served from Belgium - the idea still holds.)

I think adding country/city and address text will be fine. country and city should be separate for reporting. Managers always ask for these kind of reports which you do not expect and I dont prefer running a LIKE query through a large database.

Not to give Facebook undue respect. However, the overall structure of the database seems to be overlooked in many web applications launching every day. Obviously I don't think there is a perfect solution that covers all the potential variables with address structure without some hard work. That said, combined with autocomplete Facebook manages to take location input data and eliminate a majority of their redundant entries. They do this by organizing their database well enough to provide autocomplete information in a low cost, low error way to the client in real time allowing them to more or less choose the correct location from an existing list.
I think the best solution is to access a third party database which contains your desired geographic scope and use it to initially seed your user location information. This will allow you to avoid doing the groudwork of creating your own. With any luck you can reduce the load on your server by allowing your new users to receive the correct autocomplete information directly off your third party supplier. Eventually you will be able to fill most autocomplete for location information such as city, country, etc. from information contained in your own database from user input data.

You need to provide a bit more details about how you are planning to use the data. For example, fields like City, State, Country can either be text in the single table, or be codes which are linked to a separate table with a Foreign Key.
Simplest would be
Address_Line_01 (Required, Non blank)
Address_Line_02
Address_Line_03
Landmark
City (Required)
Pin (Required)
Province_District
State (Required)
Country (Required)
All the above can be Text/Unicode with appropriate field lengths.
Phone Numbers as applicable.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight