Storing 100k map markers in App Engine - google-app-engine

I'm designing yet another "Find Objects near my location" web site and mobile app.
My requirements are:
Store up to 100k objects;
Query for objects that are close to the point (my location, city, etc). And other search criteria (like object type);
Display results on the Google Maps with smooth performance.
Let user filter objects by object time.
I'm thinking about using Google App Engine for this project.
Could You recommend what would be the best data storage option for this?
And couple of words about dynamic data loading strategy.
I kinda feel overwhelmed with options at the moment and looking for hints where should I continue my research.
Thanks a lot!

I'm going to to assume that you are using the datastore. I'm not familiar with Google Cloud SQL (which I believe aims to offer MySQL-like features in the cloud), so I can't speak if it can do geospatial queries.
I've been looking into the whole "get locations in proximity of a location" problem for a while now. I have some good and bad news for you, unfortunately.
The best way to do the proximity search in the Google Environment is via the Search Service (https://developers.google.com/appengine/docs/python/search/ or find the JAVA link ). Reason being is that it supports a "Geopoint Field", and allows you to query in such a way.
Ok, cool, so there is support, right? However, "A query is complex if its query string includes the name of a geopoint field or at least one OR or NOT boolean operator". The free quota for Complex Search Queries are 100/day. Per 10,000 queries, it costs 60 cents. Depending on your application, this may be an issue.
I'm not too familar with the Google Maps API you might be able to pull off something like this :(https://developers.google.com/maps/articles/phpsqlsearch_v3)
My current project/problem involves moving locations, and not "static" ones (stores, landmarks,etc). I've decided to go with Amazon's Dynamodb and they have a library which supports geospatial indexing : http://aws.amazon.com/about-aws/whats-new/2013/09/05/announcing-amazon-dynamodb-geospatial-indexing/

Related

Spatial Search Objectify, appengine

I want to use, objectify for spatial search. I have entities that have longitude and latitude associated with them. Latitude and longitude information is dynamic e.g. service providers (like electrician, carpenter) in a city. I want to implement a query that gives me service providers providing some specific service in 1 Km radius. Searching on google reveals following options
Use Objectify with geohashes - Not sure, how accurate and scalable this solution is
Use Google Search - It will need entities(or part of it) duplicated in the form of documents and Will it be able to support dynamically updated locations.
Use other database like mongodb
Assuming few millions entities and latitude/longitude dynamically updated, please suggest me an appropriate option.
thanks
Ittium
I've used geohashes. It works, although you end up selecting more data than the exact bounds you are looking for and then filtering out the extra. This might or might not be a good solution depending on your specific application. It requires writing more code but has fewer moving parts (all in the datastore).
Google search and "other database" are basically the same architectural pattern - use the task queue to replicate updates to an external index. If you want a quick solution, the search service is probably is the easiest to wrap your head around.
Just pick one solution and run with it for a while. You can always reindex the data into a different solution.
It really depends on your query rate but I usually prefer to use google search. Building and maintaining docs is pretty simple and you get a different quota to handle this queries.

Geospatial Database Cloud Server

Are there any cloud hosting solutions for geospatial data? I am currently writing a directory style app where businesses can sign up and then users can find nearby ones.
I am considering Google App Engine for this, but from what I can tell the GeoModel code is quite expensive (up to tens of thousands of dollars a year) to run since Google updated the pricing of App Engine. It doesn't seem like App Engine's database is really suited to this kind of query (though the SQL solution may be an answer).
I was hoping to find a service where I could send off a HTTP request to add data (a business' id, name and icon url) to a database, and then another one to find a list of businesses that are nearby to a given point. A service is preferable as this is work done for a client and we would like the solution to be managed with as little interaction from us or the client needed as possible.
EDIT:
I just found cartodb.com which uses PostgreSQL and is reasonably priced. Are the any other alternatives?
The App Engine Search API (currently in Experimental) supports GeoPoints and geosearch, and is great for exactly the kind of query that you describe.
See the Google Developers Academy (GDA) App Engine Search API classes for a bit more info and an example as well.
http://www.iriscouch.com/ is a cloud-based host for CouchDB and they support the geocouch extensions for CouchDB to store geoJSON data and perform spatial queries.
We have decided to go with cartodb.com because it looks like they have a good price to ease of use ratio.
You mentioned going with CartoDB, which is a good choice with a nice UI.
Just adding, if you were just looking for a scalable backend, you could use StormDB. It is a cloud hosted SQL database with geospatial extensions. You data is automatically distributed amongst multiple nodes for write, read, and parallel query scalability.

App Engine Full Text Search vs Geohashing for location queries

I'm thinking of porting an application from RoR to Python App Engine that is heavily geo search centric. I've been using one of the open source GeoModel (i.e. geohashing) libraries to allow the application to handle queries that answer questions like "what restaurants are near this point (lat/lng pair)" and things of that nature.
GeoModel uses a ListProperty which creates a heavy index which had me concerned about pricing as i have about 10 million entities that would need to be loaded into production.
This article that I found this morning seems pretty scary in terms of costs:
https://groups.google.com/forum/?fromgroups#!topic/google-appengine/-FqljlTruK4
So my question is - is geohashing a moot concept now that Google has released their full text search which has support for geo searching? It's not clear what's going on behind the scenes with this new API though and I'm concerned the index sizes might be just as big as if I used the GeoModel approach.
The other problem with the search API is that it appears I'd have to create not only my models in the datastore but then replicate some of that data (GeoPtProperty and entity_key for the model it represents at a minimum) into Documents which greatly increases my data set.
Any thoughts on this? At the moment I'm contemplating scraping this port as being too expensive although I've really enjoyed working in the App Engine environment so far and would love to get away from EC2 for some of my applications.
You're asking many questions here:
is geohashing a moot concept: Probably not, I suspect the Search API uses geohashing, or something similar for its location search.
can you use the Search API vs implementing it yourself: yes, but I don't know the cost one way or the other.
is geohashing expensive on app engine: in the message thread the cost is bad due to high index write costs. you'll have to engineer your geohashing data to minimize the indexing. If GeoModel puts a lot of indexed values in the list, you may be in trouble - I wouldn't use it directly without knowing how the indexing works. My guess is that if you reduce the location accuracy you can reduce the number of indexed entries, and that could save you a lot of cost.
As mentioned in the thread, you could have the geohashing run in CloudSQL.

Google Cloud Storage performance and full-text search

I'm developing an application that servers data to the users (like we all, right?). It's crucial for the data to be full-text searchable. For now, we store over 30 million records, some of them searchable only on one field but some of them searchable on a few fields.
I'm considering Google cloud solutions, but I'm very new to their storage technology. Googling the full-text search for their cloud brings results like: "it's experimental", "there are some workarounds" etc.
Could someone from hers/his own experience tell me what it possible and what is the performance of full-text search in Google Could Storage solutions?
Thx in advance,
trzewiczek
As far as I know, full-text search for appengine (and Google Cloud Storage) isn't available yet.
There is a blog post here about it:
http://googleappengine.blogspot.com/2011/10/app-engine-155-sdk-release.html
And a form here to register your interest, but you have to have a proper app to apply it to:
https://docs.google.com/spreadsheet/viewform?formkey=dEdWcnRJUXZ2VGR3YmVsT1Q1WVB2Smc6MQ&ndplr=1
To at least try and answer your question though, like, it's Google, so it's gonna be fast isn't it? They have been holding it back for a long time, presumably because it's not up to the task yet, but hopefully that will turn out to be a good thing in the long run.

Confused about Google App Engine and Google Docs options

I want to use the Google App Engine to store my data and then query/display/ edit it using Google Spreadsheets as the user interface, with multiple concurrent users having their own view of the data. The problem I have now is that if I put everyone's data on the same Google Spreadsheet that everyone accesses, we can't each do sorting / filtering at the same time.
Is there a way to do this, and is it a good idea to build a simple system this way? I'll eventually need to query a series of Google Word Processor documents as well.
Can someone point me in the right direction on this or suggest other options?
I would ask what the advantage of doing something like this is as opposed to say hosting your application on Google App Engine and building a javascript front end with grids to help sort/filter and view data.
Anyway to answer your questions, you can build your interface over Google Spreadsheets using Google App Scripts. This will allow you to do things like authenticate your user, query, update and display data. If you want to merely display data it turns out that Google Spreadsheets has some built-in functions to do that.
Regarding consistency you should read up on GAE's Datastore as well as its features like transactions. The datastore is not an RDBMS, but is an object database which stores objects against keys. Again something to consider if you are planning to do a lot of data analysis and computation (summations, aggregations).
Overall I would recommend doing a rough design of your system without fixing on particular technologies (like GAE, and Google Spreadsheets). Once you identify what your key goals are for your application, then you can figure out which technologies and resources would make the most sense within your budget.

Resources