Finding lat/lng for based on postalcode - maps

I have a dataset of thousands of full addresses of business (specifically in the netherlands, but I guess the question can apply everywhere).
I want to find the lat/lng so I can do distance calculation, but because of the size of the dataset I'm worried it's not a wise idea to do this using for example google maps.
Is there a webservice I could query to find all this info?

The Google Geocoder web service is available for this:
http://code.google.com/apis/maps/documentation/geocoding/index.html
It's free (unless you abuse it, or volumes get too big), and returns JSON or XML.

I've been using Google but it misses many (Scandinavian) addresses which are caught by Yahoo. See http://developer.yahoo.com/maps/rest/V1/geocode.html and at least compare the two for your needs. If I were you I would have every miss returned by Google to be geocoded by Yahoo as fallback (or the other way around.)

Accurate postcode information is owned by someone in most jurisdictions and they charge for supplying the lat/lng information. In the UK it is the Post Office, I don't know about the Netherlands, but this looks quite promising. Even Google's geocoder is not that accurate outside the US.
One thing I should mention is that the lat/lng will not be sufficient for you to calculate distances (unless you are going everywhere by crow). One of the real advantages of Google's service is that GDirections uses knowledge of the road system and estimates journey time. If you are solving some sort of travelling salesman problem, lat/lng alone is not going to give you a very good estimate of actual distance, especially in cities.
HTH

Not sure of the quality/accuracy of the geocode but this could be an option, http://www.opengeocoding.org/geocoding/geocod.html

Related

Spatial Search Objectify, appengine

I want to use, objectify for spatial search. I have entities that have longitude and latitude associated with them. Latitude and longitude information is dynamic e.g. service providers (like electrician, carpenter) in a city. I want to implement a query that gives me service providers providing some specific service in 1 Km radius. Searching on google reveals following options
Use Objectify with geohashes - Not sure, how accurate and scalable this solution is
Use Google Search - It will need entities(or part of it) duplicated in the form of documents and Will it be able to support dynamically updated locations.
Use other database like mongodb
Assuming few millions entities and latitude/longitude dynamically updated, please suggest me an appropriate option.
thanks
Ittium
I've used geohashes. It works, although you end up selecting more data than the exact bounds you are looking for and then filtering out the extra. This might or might not be a good solution depending on your specific application. It requires writing more code but has fewer moving parts (all in the datastore).
Google search and "other database" are basically the same architectural pattern - use the task queue to replicate updates to an external index. If you want a quick solution, the search service is probably is the easiest to wrap your head around.
Just pick one solution and run with it for a while. You can always reindex the data into a different solution.
It really depends on your query rate but I usually prefer to use google search. Building and maintaining docs is pretty simple and you get a different quota to handle this queries.

Store location information, or use a third party source?

I'm working on a location-based web app (learning purposes), where users will rate local businesses. I then want users to be able to see local businesses, based on where they live and a given range (i.e., businesses within 10 miles of 123 Street. City St, 12345).
I'm wondering what I should use for location info; some 3rd party source (like Googles geocoding API) or host my own location database? I know of zip-code databases that come with lat/lon of each zip code along with other data, but these databases are often not complete, and definitely not global.
I know that most API's set usage limits, which may be a deciding factor. I suppose what I could do is store all data retrieved from Google into a database of my own, so that I never make the same query twice.
What do you guys suggest? I've tried looking at existing answers on SO, but nothing helped.
EDIT To be clear, I need a way to find all businesses that fall within a certain range of a given location. Is there a service that I could use to do this (i.e., return all cities, zips, etc. that fall within range of a given location)
Storing the data you retrieve in a local cache is always a good idea. It will reduce lag and keep from taxing whatever API you are using. It can also help keep you under usage limits as you stated. You can always place size limits on that cache and clear it out as it ages if the need arises.
Using an API means that you'll only be pulling data for sites you need information on, versus buying a bunch of data and having to load/host it all yourself (these can tend to get huge). I suggest using and API+caching

Data mining town and city names

I want to be able to create a full list of cities and towns for any given country.
I have been looking at the google maps api, but it seems to not be suited for the purpose given that it doesn't return enough results for a given query and a there is no way to ask for the "next" results for the query as far as I can tell. It is however clear that google maps does contain the information I am looking for, the matter is getting to it.
The source doesn't need to be google maps, of course. Bing maps etc could also be a possibility. Any ideas?
Some of the 'Postal Comapanies' web sites for each country contain downloaded Post Code/Zip tables. This would be the easiest place to start searching.
http://www.foreign-trade.com/resources/country-code.htm may also be a good start for some countries.
For Australia : http://auspost.com.au/products-and-services/download-postcode-data.html

How to get book metadata?

My application needs to retrieve information about any published book based on a provided ISBN, title, or author. This is hardly a unique requirement---sites like Amazon.com, Chegg.com, and even software like Book Collector seem to be able to do this easily. But I have not been able to replicate it.
To clarify, I do not need to search the entire database of books---only a limited subset which have been inputted, as in a book collection. The database would simply allow me to tag the inputted books with the necessary metadata to enable search on that subset of books. So scale is not the issue here---getting the metadata is.
The options I have tried are:
Scrape Amazon. Scraping the regular Amazon pages was not very robust to things like missing authors, and while scraping the smaller mobile pages was faster, they shared the same issues with robustness of extraction. Plus, building this into an application is a clear violation of Amazon's Terms of Service.
Scrape the Library of Congress. While this seems to have fewer legal ramifications, ease and robustness were again issues.
ISBNdb.com API. While the service is free up to a point, and does a good job of returning the necessary metadata, I need to do this for over 500 books on a daily basis, at which point this service costs money proportional to use. I'd prefer a free or one-time payment solution that allows me to do the same.
Google Book Data API. While this seems to provide the information I need, I cannot display the book preview as their terms of service requires.
Buy a license to a database of books. For example, companies like Ingram or Baker & Taylor provide these catalogs to retailers and libraries. This solution is obviously expensive, so I'm hoping that there's a more elegant solution I've missed. But if not, and someone on SO has had a good experience with a particular database, I'm willing to go with that.
I've tried to describe my approach in detail so others with fewer books can take advantage of the above solutions. But given my requirements, I'm at my wits' end for retrieving book metadata.
Since it is unlikely that you have to retrieve the same 500 books every day: store the data retrieved from isbndb.com in a database and fill it up book by book.
Instead of scraping Amazon, you can use the API they expose for their affiliate program: https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
It allows about 3k requests per hour and returns well-formed XML. It requires you to set a link to the book that you show the information about, and you must state that you are an affiliate partner.
This might be what you're looking for. They even offer a complete download!
https://openlibrary.org/data
As it seems, a lot of libraries and other organisations make information such as "ISBN" available through MAchine-Readable Cataloging aka MARC, you can find more information about it here as well.
Now knowing the "right" term to search for I discovered WorldCat.org.
Maybe this whole MARC thing gives you a new kind of an idea :)

Where can I find a city/neighborhood database?

Where can I find a database of cities and neighborhoods using MySQL? I'm only interested in US areas. Price doesn't matter.
The database must help identify locations by ZIP code. I've already got a database showing cities and states, but I need to find surrounding neighborhoods as well.
I saw good example on http://www.oodle.com/.
The Zillow Neighborhood data has a CC-sharealike license and it is pretty comprehensive. It is widely used in the Geospatial world nowadays.
Cheers
For a fee... you can subscribe to Maponics' Neighborhood dataset
While Maponics provides mostly GIS data, (eg. allowing one to pinpoint on a map the boundaries of neighborhoods and such), the simple neighborhood list is also available, I think.
Another commercial offering is Urban Mapping's
In you target particular cities/counties, there are plenty of free resources to be found, oft' in the .gov / .us sites, for specific cities and counties. Unfortunately aside from the difficulty of locating such resources (there doesn't seem to exist any practical directory for such local gov-managed databases), there is no standard as to the format in which the data is stored or the specific semantics of the data collected. Luckily, ZIP-code is rather unanbiguous, and he neighborhood concept relatively general (even though the neighborhoods themselves can be quite dynamic, with bot the introduction of new neighborhood names, and some minor shifting of boundaries).
The overall complexity of the task of compiling such databases, the long half-life of the data, and the potentially lucrative uses of such data, seem to explain why it is hard to find non-commercial sources.
This is an old question - but there is a far better and EASIER way of doing it as of June 2015:
http://maps.googleapis.com/maps/api/geocode/json?address=YOUR_ADDRESS&sensor=false
Example:
http://maps.googleapis.com/maps/api/geocode/json?address=11%20W%2053rd%20St%20New%20York&sensor=false
Here's a great site offering free databases for both cities and countries:
http://ipinfodb.com/ip_database.php
Yelp has a neighborhood API.
http://www.yelp.com/developers/documentation/technical_overview
It might be worth checking out some of the links in this article. There are several where you might find the data you're after.
Infochimps has the Zillow Neighborhoods API:
http://www.infochimps.com/datasets/zillow-neighborhoods
Maponics has over 150,000 neighborhoods worldwide available in MySQL and other formats, as well as an API.
Urban Mapping has an API to find neighborhoods by address, City/State, and as you need in your case, Zip Code (called the getNeighborhoodsByPostalCode method).
Here is a link to their demo apps which show how it works:
URBANWARE API Demo Applications
Edit:
Urban Mapping doesn't exist anymore, and the Demo link has linkrot; here's what it did look like, via Wayback Machine
[
While this isn't a database per se, you could quickly populate your own database by calling their API for every Zip code you'd be interested in seeing.
Note that this is part of their Premium API. If you have the long/lat coordinates of each city, you can use their free API to get a list of neighborhoods whose boundaries contain the long/lat coordinates.

Resources