Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
In Google and Yahoo Maps API, I read in th terms of service:
YOU SHALL NOT: "store or allow end users to store map imagery, map data or geocoded
location information from the Yahoo!
Maps APIs for any future use;"
My problem and scope of my project is to build a real estate website. The user who posts an ad for selling his/her house will be able to write the house's address, or point it directly into a map. Then I would save that latitude/longitude data to a mysql database so I can retrieve it later when a new user will be looking for a house for sale.
Is this the correct procedure? How does that relate with the terms above? I mean, how can a geocoding system be useful if I cannot store a simple latitude/longitude?
As far as I can tell, you are not supposed to store any Geocoded data from Google into a database. I have the same problem on a Drupal site. The module I'm using stores latitude and longitude data in the database. If you look at the terms of service referenced on the page jmort253 pointed to in his answer, you'll find this:
(from https://developers.google.com/maps/terms#section_10_1_3)
10.1.3 Restrictions against Data Export or Copying.
(a) No Unauthorized Copying, Modification, Creation of Derivative
Works, or Display of the Content. You must not copy, translate,
modify, or create a derivative work (including creating or
contributing to a database) of, or publicly display any Content or any
part thereof except as explicitly permitted under these Terms. For
example, the following are prohibited: (i) creating server-side
modification of map tiles; (ii) stitching multiple static map images
together to display a map that is larger than permitted in the Maps
APIs Documentation; (iii) creating mailing lists or telemarketing
lists based on the Content; or (iv) exporting, writing, or saving the
Content to a third party's location-based platform or service.
(b) No Pre-Fetching, Caching, or Storage of Content. You must not
pre-fetch, cache, or store any Content, except that you may store: (i)
limited amounts of Content for the purpose of improving the
performance of your Maps API Implementation if you do so temporarily,
securely, and in a manner that does not permit use of the Content
outside of the Service; and (ii) any content identifier or key that
the Maps APIs Documentation specifically permits you to store. For
example, you must not use the Content to create an independent
database of "places" or other local listings information.
(c) No Mass Downloads or Bulk Feeds of Content. You must not use the
Service in a manner that gives you or any other person access to mass
downloads or bulk feeds of any Content, including but not limited to
numerical latitude or longitude coordinates, imagery, visible map
data, or places data (including business listings). For example, you
are not permitted to offer a batch geocoding service that uses Content
contained in the Maps API(s).
Parts b and c really make it sound like what you and I are trying to do is a no-no. Am I reading this wrong?
Google doesn't appear to have this restriction that you speak of. The Google Geocoding API website actually suggests caching or storing the content to reduce hits to their servers and improve performance.
As far as addresses go, your users are entering them, so you can store the addresses. The latitudes/longitudes can be retrieved from the Google Maps API service using those addresses.
It's okay to cache the results, so you'll only need to really look them up during the first search.
Geocoding refers to the reversal of the address from Lat / Long coordinates. You could have the person entering the address specify it, and store the lat / long coords into the database for later use (returned from Google or Yahoo Maps API).
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to integrate search into my website e.g. typing in a movie name, returning the data I have on that movie (could be about 20 different numbers or strings).
I don't care if people can somehow see the database. I just want that data to be returnable when someone searches for it. I DON'T want that data to ever be changed by a user. Let's say that the database is of around 50,000 movies.
I don't have many resources to run this website and server, so I would like to keep server costs down.
What would be the cheapest ways of running this kind of website? i.e. client-side search, server-side search, what hosting service?
I came across pouchdb and watermelondb, which provide an offline database. This would be quite nice if it's not too costly.
Any relevant tutorials or guides would also be very much appreciated.
This is more of an infrastructure question than a React one, but given that your data and site aren't changing, there are some solid workarounds you could do to get cheap hosting.
Let's assume you're using create-react-app, so you can easily build into a static deployment. You can put your site into an S3 bucket and then just pay when people GET things out of it, which would be quite cheap.
You'll want to keep your data someplace else; that way users can fetch your site quickly, then let the underlying data load separately. You could put it into another S3 bucket, and bam, you've got a static site with a static data source -- all for cheap. You wouldn't want to load the entire database at once, so maybe you:
Make a dedicated file which just has all the names, so the client can load that and then autocomplete any name available.
Group your data into separate files of a smaller size, in some way that you can immediately get the group you need. The most basic answer would just be alphabetical chunks.
Note that S3 really isn't a database, it's just a place to permanently store data. It doesn't do writes very well; this solution only works because your movie list isn't changing.
Here's a tutorial on hosting a React app through S3 to help get you started: https://medium.com/dailyjs/a-guide-to-deploying-your-react-app-with-aws-s3-including-https-a-custom-domain-a-cdn-and-58245251f081
I am wondering if someone can provide some insight about an approach for google maps. Currently I am developing a visualization with google maps api v3. This visualization will map out polygons for; country, state, zip code, cities, etc. As well as map 3 other markers(balloon, circle..). This data is dynamically driven by an underlying report which can have filters applied and can be drilled to many levels. The biggest problem I am running into is dynamically rendering the polygons. The data necessary to generate a polygon with Google Maps V3 is large. It also requires a good deal of processing at runtime.
My thought is that since my visualization will never allow the user to return very large data sets(all zip codes for USA). I could employ the use of dynamically created fusion tables.
Lets say for each run my report will return 50 states or 50 zip codes. Users can drill from state>zip.
The first run of the visualization users will run a report ad it will return the state name and 4 metrics. Would it be possible to dynamically create a fusion table based on this information? Would I be able to pass through 4 metrics and formatting for all of the different markers to be drawn on the map?
The second run the user will drill from state to zip code. The report will then return 50 zip codes and 4 metrics. Could the initial table be dropped and another table be created to map a map with the same requirements as above? Providing the fusion tables zip code(22054, 55678....) and 4 metric values and formatting.
Sorry for being long winded. Even after reading the fusion table documentation I am not 100% certain on this.
Fully-hosted solution
If you can upload the full dataset and get Google to do the drill-down, you could check out the Google Maps Engine platform. It's built to handle big sets of geospatial data, so you don't have to do the heavy lifting.
Product page is here: http://www.google.com/intl/en/enterprise/mapsearth/products/mapsengine.html
API doco here: https://developers.google.com/maps-engine/
Details on hooking your data up with the normal Maps API here: https://developers.google.com/maps/documentation/javascript/mapsenginelayers
Dynamic hosted solution
However, since you want to do this dynamically it's a little trickier. Neither the Fusion Tables API nor the Maps Engine API at this point in time support table creation via their APIs, so your best option is to model your data in a consistent schema so you can create your table (in either platform) ahead of time and use the API to upload & delete data on demand.
For example, you could create a table in MapsEngine ahead of time for each drill-down level (e.g. one for state, one for zip-code) & use the batchInsert method to add data at run-time.
If you prefer Fusion Tables, you can use insert or importRows.
Client-side solution
The above solutions are fairly complex & you may be better off generating your shapes using the Maps v3 API drawing features (e.g. simple polygons).
If your data mapping is quite complex, you may find it easier to bind your data to a Google Map using D3.js. There's a good example here. Unfortunately, this does mean investigating yet another API.
I'm working on a location-based web app (learning purposes), where users will rate local businesses. I then want users to be able to see local businesses, based on where they live and a given range (i.e., businesses within 10 miles of 123 Street. City St, 12345).
I'm wondering what I should use for location info; some 3rd party source (like Googles geocoding API) or host my own location database? I know of zip-code databases that come with lat/lon of each zip code along with other data, but these databases are often not complete, and definitely not global.
I know that most API's set usage limits, which may be a deciding factor. I suppose what I could do is store all data retrieved from Google into a database of my own, so that I never make the same query twice.
What do you guys suggest? I've tried looking at existing answers on SO, but nothing helped.
EDIT To be clear, I need a way to find all businesses that fall within a certain range of a given location. Is there a service that I could use to do this (i.e., return all cities, zips, etc. that fall within range of a given location)
Storing the data you retrieve in a local cache is always a good idea. It will reduce lag and keep from taxing whatever API you are using. It can also help keep you under usage limits as you stated. You can always place size limits on that cache and clear it out as it ages if the need arises.
Using an API means that you'll only be pulling data for sites you need information on, versus buying a bunch of data and having to load/host it all yourself (these can tend to get huge). I suggest using and API+caching
I'm building a site where my users will be able to specify locations (say, their residence, etc.). Then, I want to do 2 things to this information:
Plot these location on a mapping service (such as Google Map)
Allow users to search by location (e.g. find all users that live in or are in a certain radius from XYZ city)
The question I have is this: what is the best way for me to implement such features?
My concern is that my database of location information may or may not be in sync with the mapping service I use.
For example, say I have a list of cities and a user picks XYZ city from my list. Later, it turns out that city is not recognized by the mapping service. No way to plot the location on the map (but I could still provide the search by location feature).
If I try to use the database of the mapping service (if I can get a hold of something like that), I may end up being "locked in" to using that mapping service. Plus, these databases tend to be HUGE (and probably too much information for a simple application like mine).
Any recommendations on how to go about solving this problem? Thanks.
You should not rely only on the Cities but take the coordinates into account (watch out for different mappings of the world - this will change the coordinates). If you use the coordinates, you are independet from the service because x,y will always be x,y no matter the application or geo-provider. WIth the help of reverser geo-location services, you should be able to have something like that pretty easy.
This is very biased opinion: if(!!) you stick to, say Google, you can be quite sure that they won't quit the service in the next 1-2 years (maybe if there is a new uber-technology) but I doubt that very much. So I think it is ok to stick to a single service if you trust it and are sure that it will live for more than a few years.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Does anybody know how data in Google Analytics is organized? Difficult selection from large amounts of data they perform very-very fast, what structure of database is it?
AFAIK Google Analytics is derived from Urchin. As it has been said it is possible that since now Analytics is part of the Google family it is using MapReduce/BigTable. I can assume that Google had integrated the old format of Urchin DB with the new BigTable/MapReduce.
I found this links which talk about Urchin DB. Probably some of the things are still in use at the moment.
http://www.advanced-web-metrics.com/blog/2007/10/16/what-is-urchin/
this says:
[snip] ...still use a proprietary database to store reporting data, which makes ad-hoc queries a bit more limited, since you have to use Urchin-developed tools rather than the more flexible SQL tools.
http://www.urchinexperts.com/software/faq/#ques45
What type of database does Urchin use?
Urchin uses a proprietary flat file database for report data storage. The high-performance database architecture handles very high traffic sites efficiently. Some of the benefits of the data base architecture include:
* Small database footprint approximately 5-10% of raw logfile size
* Small number of database files required per profile (9 per month of historical reporting)
* Support for parallel processing of load-balanced webserver logs for increased performance
* Databases are standard files that are easy to back up and restore using native operating system utilitiesv
More info about Urchin
http://www.google.com/support/urchin45/bin/answer.py?answer=28737
Long time ago I used to have a tracker and on their site they were discussing about data normalization: http://www.2enetworx.com/dev/articles/statisticus5.asp
There you can find a bit of info of how to reduce the data in DB and maybe it is a good start in research.
BigTable
Google Publication: Chang, Fay, et al. "Bigtable: A distributed storage system for structured data."ACM Transactions on Computer Systems (TOCS) 26.2 (2008):
Bigtable is used by more than sixty Google products and projects,
including Google Analytics, Google Finance, Orkut, Personalized
Search, Writely, and Google Earth.
I'd assume they use their 'Big Table'
I can't know exactly how they implement it.
But because I've made a product that extracts non-sampled, non-aggregated data from Google Analytics I have learned a thing or two about the structure.
I makes sense that the data is populated via BigTable.
BT offers localization data awareness and map/reduce querying across n-nodes.
Distinct counts
(Whether a data service can provide distinct counts or not is a simple measure of flexibility of a data model - but it's typically also a measure of cost and performance)
Google Analytics is not built to do distinct counts even though GA can count users across almost any dimension - but it can't count e.g. Sessions per ga:pagePath?
How so...
Well they only register a session with the first pageView in a session.
This means that we can only count how many landingpages that have had a session.
We have no count for all the other 99% of pages on your site. :/
The reason for this is that Google made the choice NOT to count discount counts at all. It simply doesn't scale well economically when serving millions of sites for free.
They needed an approach where they could avoid counting distinct. Distinct count is all about sorting, grouping lists of ids for every cell in data intersection.
But...
Isn't it simple to count the distinct number of session on a ga:pagePath value?
I'll answer this in a bit
The User and data partitioning
The choice they made was to partition data on users (clientIds or userIds)
Because when they know that clientId/userId X is only present in a certain table in BT, they can run a map/reduce function that counts users and they don't have to be concerned that the same user is present in another dataset and be forced to store all clientIds/userIds in a list - group them - and then count them - distinct.
Since the current GA tracking script is called Universal Analytics they have to be able to count users correct. Especially when focusing on cross-device tracking.
OK, but how does this affect session count?
You have a set of users, each having multiple sets of sessions each having a list of page hits.
When counting within a specific session looking for a pagePaths, you will find the same page multiple times but you will not count the page more than once.
You need to write down you've already seen this page before.
When you have traversed all pages within that session you need only count the session once per page. This procedure requires a state/memory. And since the counting process is probably done in parallel on the same server. You can't be sure that a specific session is handled by the same process. Which makes the counting even more memory consuming.
Google decided not to chase that rabit any longer and just ignore that the session count is wrong for pagePath and other hit scoped dimensions.
"Cube" storage
The reason I write "cube" is that I don't know exactly if they use traditional a OLAP cube structure, but I know they have up to 100 cubes populated for answering different dimension/metric combinations.
By isolation/grouping dimensions in smaller cubes, data won't explode exponentially like it would if they put all data in a single cube.
The drawback is that not all data combinations are allowed. Which we know is true.
E.g. ga:transactionId and ga:eventCategory can't be queried together.
By choosing this structure the dataset can scale well economical and performance-wise
Many places and applications in the Google portfolio use the MapReduce algorithm for storage and processing of large quantities of data.
See the Google Research Publications on MapReduce for further information and also have a look at page 4 and page 5 of this Baseline article.
Google analytics runs on 'Mesa: Geo-Replicated, Near Real-Time, Scalable DataWarehousing'.
https://storage.googleapis.com/pub-tools-public-publication-data/pdf/42851.pdf
"Mesa is a highly scalable analytic data warehousing systemthat stores critical measurement data related to Google’sInternet advertising business."