Where can I find a city/neighborhood database? - database

Where can I find a database of cities and neighborhoods using MySQL? I'm only interested in US areas. Price doesn't matter.
The database must help identify locations by ZIP code. I've already got a database showing cities and states, but I need to find surrounding neighborhoods as well.
I saw good example on http://www.oodle.com/.

The Zillow Neighborhood data has a CC-sharealike license and it is pretty comprehensive. It is widely used in the Geospatial world nowadays.
Cheers

For a fee... you can subscribe to Maponics' Neighborhood dataset
While Maponics provides mostly GIS data, (eg. allowing one to pinpoint on a map the boundaries of neighborhoods and such), the simple neighborhood list is also available, I think.
Another commercial offering is Urban Mapping's
In you target particular cities/counties, there are plenty of free resources to be found, oft' in the .gov / .us sites, for specific cities and counties. Unfortunately aside from the difficulty of locating such resources (there doesn't seem to exist any practical directory for such local gov-managed databases), there is no standard as to the format in which the data is stored or the specific semantics of the data collected. Luckily, ZIP-code is rather unanbiguous, and he neighborhood concept relatively general (even though the neighborhoods themselves can be quite dynamic, with bot the introduction of new neighborhood names, and some minor shifting of boundaries).
The overall complexity of the task of compiling such databases, the long half-life of the data, and the potentially lucrative uses of such data, seem to explain why it is hard to find non-commercial sources.

This is an old question - but there is a far better and EASIER way of doing it as of June 2015:
http://maps.googleapis.com/maps/api/geocode/json?address=YOUR_ADDRESS&sensor=false
Example:
http://maps.googleapis.com/maps/api/geocode/json?address=11%20W%2053rd%20St%20New%20York&sensor=false

Here's a great site offering free databases for both cities and countries:
http://ipinfodb.com/ip_database.php

Yelp has a neighborhood API.
http://www.yelp.com/developers/documentation/technical_overview

It might be worth checking out some of the links in this article. There are several where you might find the data you're after.

Infochimps has the Zillow Neighborhoods API:
http://www.infochimps.com/datasets/zillow-neighborhoods

Maponics has over 150,000 neighborhoods worldwide available in MySQL and other formats, as well as an API.

Urban Mapping has an API to find neighborhoods by address, City/State, and as you need in your case, Zip Code (called the getNeighborhoodsByPostalCode method).
Here is a link to their demo apps which show how it works:
URBANWARE API Demo Applications
Edit:
Urban Mapping doesn't exist anymore, and the Demo link has linkrot; here's what it did look like, via Wayback Machine
[
While this isn't a database per se, you could quickly populate your own database by calling their API for every Zip code you'd be interested in seeing.
Note that this is part of their Premium API. If you have the long/lat coordinates of each city, you can use their free API to get a list of neighborhoods whose boundaries contain the long/lat coordinates.

Related

Freebase: Is it worth it to base my company's entire database on it?

I'm with a company that is building a venue / artist database for live music and recently came across Freebase. It looks very compelling, even if the data isn't there for new, up-and-coming bands. For those of you who have worked with Freebase, I have a couple questions:
Are there downsides to integrating all of the data entry with Freebase? We are not looking to sell or privatize this information.
What are the weaknesses of Freebase, with regards to usability?
Disclosure: I work on Freebase at Google.
The music data in Freebase is one of our strongest areas and is going to continue to get broader and richer as we continue to load more datasets. For example, we import data from MusicBrainz, clean it up and match the topics against existing topics in Freebase to avoid duplicates.
In terms of downsides, you should be prepared to work with a lot of data. For example, Freebase currently has 4 musical artists named "John Smith" which may or may not be useful for your application but you'll still need to figure out which one(s) map to the John Smith that your users are interested in. We call this "reconciliation" and its necessary so that your app knows precisely which topics to query the API for.
Since you mentioned music venues I should also point out that while Freebase has a lot of data about places, we don't yet have a geosearch API so you'd need to roll your own if that's something you need.
Since anyone can edit Freebase, you should also consider using as_of_time to protect your site against vandalism.
Freebase is great for developers because you can easily jump in and clean up bad data or add missing topics. However, one area that has always been a challenge is loading large amounts of data from outside of Google. We've built the OpenRefine which allows folks to upload datasets, but these datasets must pass a QA process that takes some time to complete. Its necessary to have these QA processes to maintain the level of quality in Freebase, but it does slow down the process of loading large datasets.
I really hope that you choose to make use of Freebase music data to build your company. I know that there are already a number of music startups happily using our data.

How can I get product information intoa database without having to populate it manually?

I am looking for a method of dynamically linking product information based on the name of the product.
For example: User types in "Playstation 3", the site would then go out and grab any information it can, such as picture, retail price, etc. Ideally, it would let you choose the correct item (returns both ps3 controller and ps3 console, user can choose which). It would then use this information in a product listing.
The easiest way I can think to implement this is to use the existing API of a major retailer such as Amazon. I have a couple completely different ideas for sites, one of which would involve selling from amazon (which I would assume they would be ok with) and another which would only be data mining the information. I am concerned they would not take it very kindly if I was just stealing their images and descriptions.
Is there another way, maybe less "sneaky" way to accomplish this that wouldn't be in legally frowned upon ?
Many web-commerce companies use a data stream known as an API - EBay, Etsy, and Amazon all have API feeds for their products. If you can convince the company to allow you access to their API (usually they will give you a key/password), then you can directly access their back-end database, typically at the read-only level. Depending on the company, you can just write them directly for access.
You are correct when you say that most companies wouldn't take kindly to someone web-scraping their product directory and re-using it. That is unethical, and could lead to big trouble with larger companies with a significant legal presence.
On the other hand, there is nothing to prevent you from cobbling together several API feeds into a Mash-Up - try Yahoo Pipes! to learn the basics of API/Mash-Up integration:
Yahoo Pipes:
http://pipes.yahoo.com/pipes/
Here is the link to Amazon's Product Advertising API program:
https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
Good luck, and happy development!
Many online retailers provide a product feed - either well-publicized (William M-B has listed some examples), or sorta-kinda hidden, for the purposes of affiliate marketing. They usually have terms of use around those product feeds, describing in detail what you're allowed to do with them, and exactly how many of your limbs are at risk if you don't play by their rules.
However, the mechanism you're describing sounds remarkably similar to a search engine; there's a well-established precedent for search engines indexing sites, and using their content to reason about the underlying site. Get a lawyer to validate this, but there's a good chance that your intended purpose falls under "fair use".
I'm representative of http://aerse.com.
We are building service, that do the following:
search product by name. For example: galaxy s3, galaxy s 3 or galaxy sIII
return technical specifications (CPU, RAM etc) and product images (thumbnails and high-res images)
provide API http://aerse.com/p
deal with legal issues, provide licenses & etc.

How to get book metadata?

My application needs to retrieve information about any published book based on a provided ISBN, title, or author. This is hardly a unique requirement---sites like Amazon.com, Chegg.com, and even software like Book Collector seem to be able to do this easily. But I have not been able to replicate it.
To clarify, I do not need to search the entire database of books---only a limited subset which have been inputted, as in a book collection. The database would simply allow me to tag the inputted books with the necessary metadata to enable search on that subset of books. So scale is not the issue here---getting the metadata is.
The options I have tried are:
Scrape Amazon. Scraping the regular Amazon pages was not very robust to things like missing authors, and while scraping the smaller mobile pages was faster, they shared the same issues with robustness of extraction. Plus, building this into an application is a clear violation of Amazon's Terms of Service.
Scrape the Library of Congress. While this seems to have fewer legal ramifications, ease and robustness were again issues.
ISBNdb.com API. While the service is free up to a point, and does a good job of returning the necessary metadata, I need to do this for over 500 books on a daily basis, at which point this service costs money proportional to use. I'd prefer a free or one-time payment solution that allows me to do the same.
Google Book Data API. While this seems to provide the information I need, I cannot display the book preview as their terms of service requires.
Buy a license to a database of books. For example, companies like Ingram or Baker & Taylor provide these catalogs to retailers and libraries. This solution is obviously expensive, so I'm hoping that there's a more elegant solution I've missed. But if not, and someone on SO has had a good experience with a particular database, I'm willing to go with that.
I've tried to describe my approach in detail so others with fewer books can take advantage of the above solutions. But given my requirements, I'm at my wits' end for retrieving book metadata.
Since it is unlikely that you have to retrieve the same 500 books every day: store the data retrieved from isbndb.com in a database and fill it up book by book.
Instead of scraping Amazon, you can use the API they expose for their affiliate program: https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
It allows about 3k requests per hour and returns well-formed XML. It requires you to set a link to the book that you show the information about, and you must state that you are an affiliate partner.
This might be what you're looking for. They even offer a complete download!
https://openlibrary.org/data
As it seems, a lot of libraries and other organisations make information such as "ISBN" available through MAchine-Readable Cataloging aka MARC, you can find more information about it here as well.
Now knowing the "right" term to search for I discovered WorldCat.org.
Maybe this whole MARC thing gives you a new kind of an idea :)

Finding lat/lng for based on postalcode

I have a dataset of thousands of full addresses of business (specifically in the netherlands, but I guess the question can apply everywhere).
I want to find the lat/lng so I can do distance calculation, but because of the size of the dataset I'm worried it's not a wise idea to do this using for example google maps.
Is there a webservice I could query to find all this info?
The Google Geocoder web service is available for this:
http://code.google.com/apis/maps/documentation/geocoding/index.html
It's free (unless you abuse it, or volumes get too big), and returns JSON or XML.
I've been using Google but it misses many (Scandinavian) addresses which are caught by Yahoo. See http://developer.yahoo.com/maps/rest/V1/geocode.html and at least compare the two for your needs. If I were you I would have every miss returned by Google to be geocoded by Yahoo as fallback (or the other way around.)
Accurate postcode information is owned by someone in most jurisdictions and they charge for supplying the lat/lng information. In the UK it is the Post Office, I don't know about the Netherlands, but this looks quite promising. Even Google's geocoder is not that accurate outside the US.
One thing I should mention is that the lat/lng will not be sufficient for you to calculate distances (unless you are going everywhere by crow). One of the real advantages of Google's service is that GDirections uses knowledge of the road system and estimates journey time. If you are solving some sort of travelling salesman problem, lat/lng alone is not going to give you a very good estimate of actual distance, especially in cities.
HTH
Not sure of the quality/accuracy of the geocode but this could be an option, http://www.opengeocoding.org/geocoding/geocod.html

Open Source Address Scrubber?

I have set of names and addresses that have been entered into and excel spreadsheet, but the problem is that the many people that entered the addresses entered them in many different non-standard formats. I want to scrub the addresses before transferring all of of them to my database. Looking around, all I really found in the way of address scrubbers(parsers or formatters) is the one that is put out by Semaphore. For my purposes, I don't really need all of that and I don't want to pay for the licensing fees for the software. Is there anything out there that is Free and/or Open Source that will do the scrubbing for me?
Since I work in the mailing business ...
A mailable address is not geo-coding. One allows the USPS to deliver mail to and the other tells you where on earth that point is. The USPS does not geo-code their mailable addresses. It's useful for marking areas/regions of people for targeting.
You're not buying a license to the software, you're buying the data. The post office has lots of rules especially if you're doing this commercially and trying to get a better rate than first class. See USPS Domestic Mail Manual for the complete list of rules. The USPS moves zips and households between zips all the time. The company (I work for) pays the USPS for its updated mailing list so we can keep our DBs updated. Weekly.
Back to your question. Do you want to change the data into a common format (street -> st) or are you looking for duplicates and want to only store real mailable addresses ?
for common format; you can break the address into pieces, clean up the white space and apply a dictionary of terms/translations. Then apply some sql to find the duplicates. Keep in mind households (1 main st) are different from persons (john doe, 1 main st).
for the mailable addresses, well some of you (the readers) won't like this answer, but you want information and that isn't free. Someone spends time or money to acquire and maintain these lists. So, find a business model to acquire funds for the list or go to someone who will do it for you. Data and mail management
Realistically, Semaphore is pretty cheap, just keep in mind that the address db will have to be updated quarterly and $19/quarter is pretty cheap.
Another Address Scrubbing product. SAP PostalSoft. I don't know what the data will cost though.
I actually work in the address verification industry... Jim's answer is a smart accept. Unfortunately for those of us with low budgets, official USPS data is pricey and the systems are complicated. (I know by experience, since the company I work for, SmartyStreets, provides address verification at lower rates than most.)
The best I can do here to help is recommend a low-cost/free alternative (depending on your volume) such as LiveAddress, where for a list of addresses there's no minimum purchase, and the API is super-cheap and super-easy, comparatively.
A .NET wrapper for the USPS APIs
http://www.codeproject.com/KB/cs/USPS_Web_Tools_Wrapper.aspx
Most of the software that I've worked with to do this is very expensive (or to put it another way, marketing departments are naive and have huge budgets).
This sort of work is a precursor to Geo-coding. This linked Wiki article includes a list of Geocoding software, some of which is free. If you're lucky, some of the free ones may include address standardizing routines.
If you find a good one, let me know.
We use Accuzip. It's a lot cheaper than most solutions (~$700/year) and comes with bi-monthly updates. It uses the USPS address standardization API, for which I've written a .NET wrapper. This allows me to run it in real-time (Accuzip, by default, comes only with a batch mode).

Resources