I'm building a site where my users will be able to specify locations (say, their residence, etc.). Then, I want to do 2 things to this information:
Plot these location on a mapping service (such as Google Map)
Allow users to search by location (e.g. find all users that live in or are in a certain radius from XYZ city)
The question I have is this: what is the best way for me to implement such features?
My concern is that my database of location information may or may not be in sync with the mapping service I use.
For example, say I have a list of cities and a user picks XYZ city from my list. Later, it turns out that city is not recognized by the mapping service. No way to plot the location on the map (but I could still provide the search by location feature).
If I try to use the database of the mapping service (if I can get a hold of something like that), I may end up being "locked in" to using that mapping service. Plus, these databases tend to be HUGE (and probably too much information for a simple application like mine).
Any recommendations on how to go about solving this problem? Thanks.
You should not rely only on the Cities but take the coordinates into account (watch out for different mappings of the world - this will change the coordinates). If you use the coordinates, you are independet from the service because x,y will always be x,y no matter the application or geo-provider. WIth the help of reverser geo-location services, you should be able to have something like that pretty easy.
This is very biased opinion: if(!!) you stick to, say Google, you can be quite sure that they won't quit the service in the next 1-2 years (maybe if there is a new uber-technology) but I doubt that very much. So I think it is ok to stick to a single service if you trust it and are sure that it will live for more than a few years.
Related
Our apartment association is planning to implement, biometrics gate passes (fingerprint turnstile) for all residents. But residents are bothered about the data privacy of the fingerprints that are stored in databases. This data resides in association harddisks, which is intended to access by some contract employees working in our apartment.
How I can make sure, data is secure and not misused/sold?
I found something here, can someone explain how?
Actually, fingerprint template is nothing but the features of the finger like crosses, deltas, parallel lines, curves and etc. So, using fingerprint template provided by the regular attendance/access control machines, you can not generate image of the real fingerprint. The templates are not unique, every time you register, you will get different string all the time. Also only machines will have the matching algorithm, hence only if the user gives the thumb impression to the the machine it can validate. So, you will not have the security threads. If see anything as threat, you can specifically say what thread, we can provide you solution.
We have put together a very comprehensive database of retailers across the country with specific criteria. It took over a year of phone interviews, etc., to put together the list. The list is, of course, not openly available on our site to download as a flat file...that would be silly.
But all the content is searchable on the site via Google Maps. So theoretically with enough zip-code searches, someone could eventually grab all the retailer data. Of course, we don't want that since our whole model is to do the research and interviews required to compile this database and offer it to end-users for consumption on our site.
So we've come to the conclusion there isnt really any way to protect the data from being taken en-masse but a potentially competing website. But is there a way to watermark the data? Since the Lat/Lon is pre-calculated in our db, we dont need the address to be 100% correct. We're thinking of, say, replacing "1776 3rd St" with "1776 Third Street" or replacing standard characters with unicode replacements. This way, if we found this data exactly on a competing site, we'd know it was plagiarism. The downside is if users tried to cut-and-paste the modified addresses into their own instance of Google Maps -- in some cases the modification would make it difficult.
How have other websites with valuable openly-distributed content tackled this challenge? Any suggestions?
Thanks
It is a question of "openly distribute" vs "not openly distribute" if you ask me. If you really want to distribute it, you should acknowledge that someone can receive the data.
With certain kinds of data (media like photos, movies, etc) you can watermark or otherwise tamper with the data so it becomes trackable, but if your content is like yours that will become hard, and even harder to defend: if you use "third street" and someone else also uses it, do you think you can make a case against them? I highly doubt it.
The only steps I can think of is
Making it harder to get all the information. Hide it behind scripts and stuff instead of putting it on google maps, make sure it is as hard as you can make it for bots to get the information, limit the amount of results shown to one user, etc. This could very well mean your service is less attractive to the end user, this is a trade-off
Sort of the opposite of above: use somewhat the same technique to HIDE some of the data for the common user instead of showing it to them. This would be FAKE data, that a normal person shouldn't see. If these retailers show up at your competitors, you've caught them red-handed. This is certainly not fool-proof, as they can check their results for validity and remove your fake stuff, there is always a possibility a user with a strange system gets the fake data which makes your served content less correct, and lastly if your competitors' scraper looks too much like real user, it won't get the data.
provide 2-step info: in step one you get the "about" info, anyone can find that. In step 2, after you've confirmed that this is what the user wants, maybe a login, maybe just limited in requests etc, you give everything. So if the user searches for easy-to-reach retailers, first say in which area you have some, and show it 'roughly' on the map, and if they have chosen something, show them in a limited environment what the real info is.
I am looking for a method of dynamically linking product information based on the name of the product.
For example: User types in "Playstation 3", the site would then go out and grab any information it can, such as picture, retail price, etc. Ideally, it would let you choose the correct item (returns both ps3 controller and ps3 console, user can choose which). It would then use this information in a product listing.
The easiest way I can think to implement this is to use the existing API of a major retailer such as Amazon. I have a couple completely different ideas for sites, one of which would involve selling from amazon (which I would assume they would be ok with) and another which would only be data mining the information. I am concerned they would not take it very kindly if I was just stealing their images and descriptions.
Is there another way, maybe less "sneaky" way to accomplish this that wouldn't be in legally frowned upon ?
Many web-commerce companies use a data stream known as an API - EBay, Etsy, and Amazon all have API feeds for their products. If you can convince the company to allow you access to their API (usually they will give you a key/password), then you can directly access their back-end database, typically at the read-only level. Depending on the company, you can just write them directly for access.
You are correct when you say that most companies wouldn't take kindly to someone web-scraping their product directory and re-using it. That is unethical, and could lead to big trouble with larger companies with a significant legal presence.
On the other hand, there is nothing to prevent you from cobbling together several API feeds into a Mash-Up - try Yahoo Pipes! to learn the basics of API/Mash-Up integration:
Yahoo Pipes:
http://pipes.yahoo.com/pipes/
Here is the link to Amazon's Product Advertising API program:
https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
Good luck, and happy development!
Many online retailers provide a product feed - either well-publicized (William M-B has listed some examples), or sorta-kinda hidden, for the purposes of affiliate marketing. They usually have terms of use around those product feeds, describing in detail what you're allowed to do with them, and exactly how many of your limbs are at risk if you don't play by their rules.
However, the mechanism you're describing sounds remarkably similar to a search engine; there's a well-established precedent for search engines indexing sites, and using their content to reason about the underlying site. Get a lawyer to validate this, but there's a good chance that your intended purpose falls under "fair use".
I'm representative of http://aerse.com.
We are building service, that do the following:
search product by name. For example: galaxy s3, galaxy s 3 or galaxy sIII
return technical specifications (CPU, RAM etc) and product images (thumbnails and high-res images)
provide API http://aerse.com/p
deal with legal issues, provide licenses & etc.
I'm working on a location-based web app (learning purposes), where users will rate local businesses. I then want users to be able to see local businesses, based on where they live and a given range (i.e., businesses within 10 miles of 123 Street. City St, 12345).
I'm wondering what I should use for location info; some 3rd party source (like Googles geocoding API) or host my own location database? I know of zip-code databases that come with lat/lon of each zip code along with other data, but these databases are often not complete, and definitely not global.
I know that most API's set usage limits, which may be a deciding factor. I suppose what I could do is store all data retrieved from Google into a database of my own, so that I never make the same query twice.
What do you guys suggest? I've tried looking at existing answers on SO, but nothing helped.
EDIT To be clear, I need a way to find all businesses that fall within a certain range of a given location. Is there a service that I could use to do this (i.e., return all cities, zips, etc. that fall within range of a given location)
Storing the data you retrieve in a local cache is always a good idea. It will reduce lag and keep from taxing whatever API you are using. It can also help keep you under usage limits as you stated. You can always place size limits on that cache and clear it out as it ages if the need arises.
Using an API means that you'll only be pulling data for sites you need information on, versus buying a bunch of data and having to load/host it all yourself (these can tend to get huge). I suggest using and API+caching
I have a database that I am accessing through Django & Python. We want to store buildings based on their addresses (not names, since some buildings simply don't have names).
We need to prevent users from entering duplicate entries into our database for the same building. This is made difficult by the way people could type in the addresses (eg. "1000 Main Street" vs. "1000 Main St.")
In what way can we reliably prevent duplicates? I am using a MySQL database.
Thanks
If you're working only with the U.S., you can use the USPS Address Standardization web service to resolve duplicates:
http://www.usps.com/webtools/address.htm
Address de-duplication is a complicated task. While the USPS web service is alright, it's seriously lacking in some important features. Plus, it's quite inefficient to perform batch de-duplication using a regular web service, performing requests, etc.
And, it appears the USPS has updated their site, so the link Dan posted, while useful, is now broken.
As an updated answer, I'd like to point out that I work for SmartyStreets and we remove duplicates from address lists. You could, for example, upload your list to CASS-Certified Scrubbing and the addresses will be standardized and flagged for duplicates. It's really easy this way. If you need point-of-entry validation, take a look at LiveAddress, which provides more important information than the USPS service alone does.