Google Cloud Storage performance and full-text search - google-app-engine

I'm developing an application that servers data to the users (like we all, right?). It's crucial for the data to be full-text searchable. For now, we store over 30 million records, some of them searchable only on one field but some of them searchable on a few fields.
I'm considering Google cloud solutions, but I'm very new to their storage technology. Googling the full-text search for their cloud brings results like: "it's experimental", "there are some workarounds" etc.
Could someone from hers/his own experience tell me what it possible and what is the performance of full-text search in Google Could Storage solutions?
Thx in advance,
trzewiczek

As far as I know, full-text search for appengine (and Google Cloud Storage) isn't available yet.
There is a blog post here about it:
http://googleappengine.blogspot.com/2011/10/app-engine-155-sdk-release.html
And a form here to register your interest, but you have to have a proper app to apply it to:
https://docs.google.com/spreadsheet/viewform?formkey=dEdWcnRJUXZ2VGR3YmVsT1Q1WVB2Smc6MQ&ndplr=1
To at least try and answer your question though, like, it's Google, so it's gonna be fast isn't it? They have been holding it back for a long time, presumably because it's not up to the task yet, but hopefully that will turn out to be a good thing in the long run.

Related

azure cognitive search migration to other product?

We have some products that heavily rely on azure cognitive search, it is a good product, but gradually we got quite a lot of problems with it, including:
You can't scale up or scale down without deleting the whole instance.
You can't use pipeline to create/update indexes unless you call it using web api. Modifying/delete field in an index is also not straightforward.
No data replication between search instances.
No cross index query even in the same instance.
No case insensitive search
Suggestions for above have been sitting in Microsoft's suggestion site for years and nothing was every done to address it. I really have no idea when or ever Microsoft will bother to provide better service.
As a result, I am starting to look around for alternative products (looking at elastic search at the moment). Is there any product that supports search syntax translation that make the migration easier (so we don't have to break so many things)?

Azure Search - size maxed - any options?

Azure Search service maxes out at 300GB of data. As of today, we've exceeded that. Our database table consists mainly of unstructured text from website news articles around the world.
Do we have any options at all here? We like Azure Search and have built our entire back-end infrastructure around it, but now we're dead in the water with being able to add any more documents to it. Does Azure Search allow compression on the documents?
Azure Search offers a variety of SKUs. The biggest one allows you to index up to 2.4 TB per service. You can find more details here.
Note, changing SKUs requires re-indexing the data.
We don't provide data compression. If you'd like to talk to Azure Search program managers about your capacity requirements, feel free to reach out to #liamca.

Storing 100k map markers in App Engine

I'm designing yet another "Find Objects near my location" web site and mobile app.
My requirements are:
Store up to 100k objects;
Query for objects that are close to the point (my location, city, etc). And other search criteria (like object type);
Display results on the Google Maps with smooth performance.
Let user filter objects by object time.
I'm thinking about using Google App Engine for this project.
Could You recommend what would be the best data storage option for this?
And couple of words about dynamic data loading strategy.
I kinda feel overwhelmed with options at the moment and looking for hints where should I continue my research.
Thanks a lot!
I'm going to to assume that you are using the datastore. I'm not familiar with Google Cloud SQL (which I believe aims to offer MySQL-like features in the cloud), so I can't speak if it can do geospatial queries.
I've been looking into the whole "get locations in proximity of a location" problem for a while now. I have some good and bad news for you, unfortunately.
The best way to do the proximity search in the Google Environment is via the Search Service (https://developers.google.com/appengine/docs/python/search/ or find the JAVA link ). Reason being is that it supports a "Geopoint Field", and allows you to query in such a way.
Ok, cool, so there is support, right? However, "A query is complex if its query string includes the name of a geopoint field or at least one OR or NOT boolean operator". The free quota for Complex Search Queries are 100/day. Per 10,000 queries, it costs 60 cents. Depending on your application, this may be an issue.
I'm not too familar with the Google Maps API you might be able to pull off something like this :(https://developers.google.com/maps/articles/phpsqlsearch_v3)
My current project/problem involves moving locations, and not "static" ones (stores, landmarks,etc). I've decided to go with Amazon's Dynamodb and they have a library which supports geospatial indexing : http://aws.amazon.com/about-aws/whats-new/2013/09/05/announcing-amazon-dynamodb-geospatial-indexing/

Geospatial Database Cloud Server

Are there any cloud hosting solutions for geospatial data? I am currently writing a directory style app where businesses can sign up and then users can find nearby ones.
I am considering Google App Engine for this, but from what I can tell the GeoModel code is quite expensive (up to tens of thousands of dollars a year) to run since Google updated the pricing of App Engine. It doesn't seem like App Engine's database is really suited to this kind of query (though the SQL solution may be an answer).
I was hoping to find a service where I could send off a HTTP request to add data (a business' id, name and icon url) to a database, and then another one to find a list of businesses that are nearby to a given point. A service is preferable as this is work done for a client and we would like the solution to be managed with as little interaction from us or the client needed as possible.
EDIT:
I just found cartodb.com which uses PostgreSQL and is reasonably priced. Are the any other alternatives?
The App Engine Search API (currently in Experimental) supports GeoPoints and geosearch, and is great for exactly the kind of query that you describe.
See the Google Developers Academy (GDA) App Engine Search API classes for a bit more info and an example as well.
http://www.iriscouch.com/ is a cloud-based host for CouchDB and they support the geocouch extensions for CouchDB to store geoJSON data and perform spatial queries.
We have decided to go with cartodb.com because it looks like they have a good price to ease of use ratio.
You mentioned going with CartoDB, which is a good choice with a nice UI.
Just adding, if you were just looking for a scalable backend, you could use StormDB. It is a cloud hosted SQL database with geospatial extensions. You data is automatically distributed amongst multiple nodes for write, read, and parallel query scalability.

Confused about Google App Engine and Google Docs options

I want to use the Google App Engine to store my data and then query/display/ edit it using Google Spreadsheets as the user interface, with multiple concurrent users having their own view of the data. The problem I have now is that if I put everyone's data on the same Google Spreadsheet that everyone accesses, we can't each do sorting / filtering at the same time.
Is there a way to do this, and is it a good idea to build a simple system this way? I'll eventually need to query a series of Google Word Processor documents as well.
Can someone point me in the right direction on this or suggest other options?
I would ask what the advantage of doing something like this is as opposed to say hosting your application on Google App Engine and building a javascript front end with grids to help sort/filter and view data.
Anyway to answer your questions, you can build your interface over Google Spreadsheets using Google App Scripts. This will allow you to do things like authenticate your user, query, update and display data. If you want to merely display data it turns out that Google Spreadsheets has some built-in functions to do that.
Regarding consistency you should read up on GAE's Datastore as well as its features like transactions. The datastore is not an RDBMS, but is an object database which stores objects against keys. Again something to consider if you are planning to do a lot of data analysis and computation (summations, aggregations).
Overall I would recommend doing a rough design of your system without fixing on particular technologies (like GAE, and Google Spreadsheets). Once you identify what your key goals are for your application, then you can figure out which technologies and resources would make the most sense within your budget.

Resources