Data mining town and city names - maps

I want to be able to create a full list of cities and towns for any given country.
I have been looking at the google maps api, but it seems to not be suited for the purpose given that it doesn't return enough results for a given query and a there is no way to ask for the "next" results for the query as far as I can tell. It is however clear that google maps does contain the information I am looking for, the matter is getting to it.
The source doesn't need to be google maps, of course. Bing maps etc could also be a possibility. Any ideas?

Some of the 'Postal Comapanies' web sites for each country contain downloaded Post Code/Zip tables. This would be the easiest place to start searching.
http://www.foreign-trade.com/resources/country-code.htm may also be a good start for some countries.
For Australia : http://auspost.com.au/products-and-services/download-postcode-data.html

Related

Spatial Search Objectify, appengine

I want to use, objectify for spatial search. I have entities that have longitude and latitude associated with them. Latitude and longitude information is dynamic e.g. service providers (like electrician, carpenter) in a city. I want to implement a query that gives me service providers providing some specific service in 1 Km radius. Searching on google reveals following options
Use Objectify with geohashes - Not sure, how accurate and scalable this solution is
Use Google Search - It will need entities(or part of it) duplicated in the form of documents and Will it be able to support dynamically updated locations.
Use other database like mongodb
Assuming few millions entities and latitude/longitude dynamically updated, please suggest me an appropriate option.
thanks
Ittium
I've used geohashes. It works, although you end up selecting more data than the exact bounds you are looking for and then filtering out the extra. This might or might not be a good solution depending on your specific application. It requires writing more code but has fewer moving parts (all in the datastore).
Google search and "other database" are basically the same architectural pattern - use the task queue to replicate updates to an external index. If you want a quick solution, the search service is probably is the easiest to wrap your head around.
Just pick one solution and run with it for a while. You can always reindex the data into a different solution.
It really depends on your query rate but I usually prefer to use google search. Building and maintaining docs is pretty simple and you get a different quota to handle this queries.

Storing 100k map markers in App Engine

I'm designing yet another "Find Objects near my location" web site and mobile app.
My requirements are:
Store up to 100k objects;
Query for objects that are close to the point (my location, city, etc). And other search criteria (like object type);
Display results on the Google Maps with smooth performance.
Let user filter objects by object time.
I'm thinking about using Google App Engine for this project.
Could You recommend what would be the best data storage option for this?
And couple of words about dynamic data loading strategy.
I kinda feel overwhelmed with options at the moment and looking for hints where should I continue my research.
Thanks a lot!
I'm going to to assume that you are using the datastore. I'm not familiar with Google Cloud SQL (which I believe aims to offer MySQL-like features in the cloud), so I can't speak if it can do geospatial queries.
I've been looking into the whole "get locations in proximity of a location" problem for a while now. I have some good and bad news for you, unfortunately.
The best way to do the proximity search in the Google Environment is via the Search Service (https://developers.google.com/appengine/docs/python/search/ or find the JAVA link ). Reason being is that it supports a "Geopoint Field", and allows you to query in such a way.
Ok, cool, so there is support, right? However, "A query is complex if its query string includes the name of a geopoint field or at least one OR or NOT boolean operator". The free quota for Complex Search Queries are 100/day. Per 10,000 queries, it costs 60 cents. Depending on your application, this may be an issue.
I'm not too familar with the Google Maps API you might be able to pull off something like this :(https://developers.google.com/maps/articles/phpsqlsearch_v3)
My current project/problem involves moving locations, and not "static" ones (stores, landmarks,etc). I've decided to go with Amazon's Dynamodb and they have a library which supports geospatial indexing : http://aws.amazon.com/about-aws/whats-new/2013/09/05/announcing-amazon-dynamodb-geospatial-indexing/

How can I get product information intoa database without having to populate it manually?

I am looking for a method of dynamically linking product information based on the name of the product.
For example: User types in "Playstation 3", the site would then go out and grab any information it can, such as picture, retail price, etc. Ideally, it would let you choose the correct item (returns both ps3 controller and ps3 console, user can choose which). It would then use this information in a product listing.
The easiest way I can think to implement this is to use the existing API of a major retailer such as Amazon. I have a couple completely different ideas for sites, one of which would involve selling from amazon (which I would assume they would be ok with) and another which would only be data mining the information. I am concerned they would not take it very kindly if I was just stealing their images and descriptions.
Is there another way, maybe less "sneaky" way to accomplish this that wouldn't be in legally frowned upon ?
Many web-commerce companies use a data stream known as an API - EBay, Etsy, and Amazon all have API feeds for their products. If you can convince the company to allow you access to their API (usually they will give you a key/password), then you can directly access their back-end database, typically at the read-only level. Depending on the company, you can just write them directly for access.
You are correct when you say that most companies wouldn't take kindly to someone web-scraping their product directory and re-using it. That is unethical, and could lead to big trouble with larger companies with a significant legal presence.
On the other hand, there is nothing to prevent you from cobbling together several API feeds into a Mash-Up - try Yahoo Pipes! to learn the basics of API/Mash-Up integration:
Yahoo Pipes:
http://pipes.yahoo.com/pipes/
Here is the link to Amazon's Product Advertising API program:
https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
Good luck, and happy development!
Many online retailers provide a product feed - either well-publicized (William M-B has listed some examples), or sorta-kinda hidden, for the purposes of affiliate marketing. They usually have terms of use around those product feeds, describing in detail what you're allowed to do with them, and exactly how many of your limbs are at risk if you don't play by their rules.
However, the mechanism you're describing sounds remarkably similar to a search engine; there's a well-established precedent for search engines indexing sites, and using their content to reason about the underlying site. Get a lawyer to validate this, but there's a good chance that your intended purpose falls under "fair use".
I'm representative of http://aerse.com.
We are building service, that do the following:
search product by name. For example: galaxy s3, galaxy s 3 or galaxy sIII
return technical specifications (CPU, RAM etc) and product images (thumbnails and high-res images)
provide API http://aerse.com/p
deal with legal issues, provide licenses & etc.

Finding lat/lng for based on postalcode

I have a dataset of thousands of full addresses of business (specifically in the netherlands, but I guess the question can apply everywhere).
I want to find the lat/lng so I can do distance calculation, but because of the size of the dataset I'm worried it's not a wise idea to do this using for example google maps.
Is there a webservice I could query to find all this info?
The Google Geocoder web service is available for this:
http://code.google.com/apis/maps/documentation/geocoding/index.html
It's free (unless you abuse it, or volumes get too big), and returns JSON or XML.
I've been using Google but it misses many (Scandinavian) addresses which are caught by Yahoo. See http://developer.yahoo.com/maps/rest/V1/geocode.html and at least compare the two for your needs. If I were you I would have every miss returned by Google to be geocoded by Yahoo as fallback (or the other way around.)
Accurate postcode information is owned by someone in most jurisdictions and they charge for supplying the lat/lng information. In the UK it is the Post Office, I don't know about the Netherlands, but this looks quite promising. Even Google's geocoder is not that accurate outside the US.
One thing I should mention is that the lat/lng will not be sufficient for you to calculate distances (unless you are going everywhere by crow). One of the real advantages of Google's service is that GDirections uses knowledge of the road system and estimates journey time. If you are solving some sort of travelling salesman problem, lat/lng alone is not going to give you a very good estimate of actual distance, especially in cities.
HTH
Not sure of the quality/accuracy of the geocode but this could be an option, http://www.opengeocoding.org/geocoding/geocod.html

Where can I find a city/neighborhood database?

Where can I find a database of cities and neighborhoods using MySQL? I'm only interested in US areas. Price doesn't matter.
The database must help identify locations by ZIP code. I've already got a database showing cities and states, but I need to find surrounding neighborhoods as well.
I saw good example on http://www.oodle.com/.
The Zillow Neighborhood data has a CC-sharealike license and it is pretty comprehensive. It is widely used in the Geospatial world nowadays.
Cheers
For a fee... you can subscribe to Maponics' Neighborhood dataset
While Maponics provides mostly GIS data, (eg. allowing one to pinpoint on a map the boundaries of neighborhoods and such), the simple neighborhood list is also available, I think.
Another commercial offering is Urban Mapping's
In you target particular cities/counties, there are plenty of free resources to be found, oft' in the .gov / .us sites, for specific cities and counties. Unfortunately aside from the difficulty of locating such resources (there doesn't seem to exist any practical directory for such local gov-managed databases), there is no standard as to the format in which the data is stored or the specific semantics of the data collected. Luckily, ZIP-code is rather unanbiguous, and he neighborhood concept relatively general (even though the neighborhoods themselves can be quite dynamic, with bot the introduction of new neighborhood names, and some minor shifting of boundaries).
The overall complexity of the task of compiling such databases, the long half-life of the data, and the potentially lucrative uses of such data, seem to explain why it is hard to find non-commercial sources.
This is an old question - but there is a far better and EASIER way of doing it as of June 2015:
http://maps.googleapis.com/maps/api/geocode/json?address=YOUR_ADDRESS&sensor=false
Example:
http://maps.googleapis.com/maps/api/geocode/json?address=11%20W%2053rd%20St%20New%20York&sensor=false
Here's a great site offering free databases for both cities and countries:
http://ipinfodb.com/ip_database.php
Yelp has a neighborhood API.
http://www.yelp.com/developers/documentation/technical_overview
It might be worth checking out some of the links in this article. There are several where you might find the data you're after.
Infochimps has the Zillow Neighborhoods API:
http://www.infochimps.com/datasets/zillow-neighborhoods
Maponics has over 150,000 neighborhoods worldwide available in MySQL and other formats, as well as an API.
Urban Mapping has an API to find neighborhoods by address, City/State, and as you need in your case, Zip Code (called the getNeighborhoodsByPostalCode method).
Here is a link to their demo apps which show how it works:
URBANWARE API Demo Applications
Edit:
Urban Mapping doesn't exist anymore, and the Demo link has linkrot; here's what it did look like, via Wayback Machine
[
While this isn't a database per se, you could quickly populate your own database by calling their API for every Zip code you'd be interested in seeing.
Note that this is part of their Premium API. If you have the long/lat coordinates of each city, you can use their free API to get a list of neighborhoods whose boundaries contain the long/lat coordinates.

Resources