Approximate search in Google App engine - google-app-engine

I am currently working on a solution for searching brand names, so far we have about 10M different brands and we are using Google Cloud Search API. We are currently indexing the 3-grams for each brand name, getting an user query and again extracting the 3-grams, then we search for documents containing all the 3-grams.
What we would like to do is to find not only documents having all 3-grams but also documents having at least one and sorting the results by the number of matches. Would it be possible to do that using the Google Cloud Search API? Or should I be looking into something like Elastic Search?
Best.

For anyone on a similar situation we ended up using Elastic Search and it has proven to be a lot more flexible than Google Full Text Search.
And even thought searching for a limited amount of N-grams was not possible Elastic allows edit distance queries which helped us to find misspellings and similar words which was essential in our use case.
We also noticed a great improvement on the search speed and specially on indexing.

Related

What's the difference between GAE Search API and Datastore queries?

I'm trying to understand which one of the search API or querying the datastore will be the most accurate for a search engine in my app.
I want something very scalable and fast. That's mean be able to find something among up to millions results and very quickly. I mean as soon as a data has been registered , this one must be immediately searchable.
I'm seeking to make an autocomplete search system inspired by the google search system (with suggested results in real time).
So, what is the more appropriate option to use for A google app engine user ?
Thanks
Both Search API and the Datastore are very fast, and their performance is not dependent on the number of documents that you index.
Search API was specifically designed to index text documents. This means you can search for text inside these text documents, and any other fields that you store in the index. Search API only supports full-word search.
The Datastore will not allow you to index text inside your documents. The Datastore, however, allows searching for entries that start with a specific character sequence.
Neither of these platforms will meet your requirements out of the box. Personally, I used the Datastore to build a search engine for music meta-data (artist names, album and recording titles) which supports both search inside a text string and suggestions (entries that contain words at any position which start with a specific character sequence). I had to implement my own inverted index to accomplish that. It works very well, especially in combination with Memcache.

using solr in coldfusion - understanding

This might be a very silly question, but please bear with me and help me out.
I have a basic understanding about what is solr? We have a solr search capability on our website built in coldfusion. I have never worked with searching on websites before. I did look up but I'm not quite clear.
Does it do a web search for the inputted string?
Or does it to a database search of the string?
Thanks
Solr is a search engine, which aggregates data and stores them in an indexed manner, and provides fast lookup. It uses Apache Lucene for indexing.
You could query Solr for a string, and it will return a list of matches, which can then be displayed in your website.
Refer to this presentation for an introduction to Solr.
Note that Solr gives a lot of features to enhance your user experience, i.e Faceted Navigation etc.

Storing 100k map markers in App Engine

I'm designing yet another "Find Objects near my location" web site and mobile app.
My requirements are:
Store up to 100k objects;
Query for objects that are close to the point (my location, city, etc). And other search criteria (like object type);
Display results on the Google Maps with smooth performance.
Let user filter objects by object time.
I'm thinking about using Google App Engine for this project.
Could You recommend what would be the best data storage option for this?
And couple of words about dynamic data loading strategy.
I kinda feel overwhelmed with options at the moment and looking for hints where should I continue my research.
Thanks a lot!
I'm going to to assume that you are using the datastore. I'm not familiar with Google Cloud SQL (which I believe aims to offer MySQL-like features in the cloud), so I can't speak if it can do geospatial queries.
I've been looking into the whole "get locations in proximity of a location" problem for a while now. I have some good and bad news for you, unfortunately.
The best way to do the proximity search in the Google Environment is via the Search Service (https://developers.google.com/appengine/docs/python/search/ or find the JAVA link ). Reason being is that it supports a "Geopoint Field", and allows you to query in such a way.
Ok, cool, so there is support, right? However, "A query is complex if its query string includes the name of a geopoint field or at least one OR or NOT boolean operator". The free quota for Complex Search Queries are 100/day. Per 10,000 queries, it costs 60 cents. Depending on your application, this may be an issue.
I'm not too familar with the Google Maps API you might be able to pull off something like this :(https://developers.google.com/maps/articles/phpsqlsearch_v3)
My current project/problem involves moving locations, and not "static" ones (stores, landmarks,etc). I've decided to go with Amazon's Dynamodb and they have a library which supports geospatial indexing : http://aws.amazon.com/about-aws/whats-new/2013/09/05/announcing-amazon-dynamodb-geospatial-indexing/

Big Query vs Text Search API

I wonder if Big Query is going to replace/compete with Text Search API? It is kinda stupid question, but Text Search API is in beta for few months and has very strict API calls limit. Bug Big Query is already there and looks very promising. Any hints what to chose to search over constantly coming error logs?
Google BigQuery and the App Engine Search API fulfill the needs of different types of applications.
BigQuery is excellent for aggregate queries (think: full table scans) over fixed schema data structures in very very large tables. The aim is speed and flexibility. BigQuery lacks the concept of indexes (by design). While it can be used for "needle in a haystack" type searches, it really shines over large, structured datasets with a fixed schema. In terms of document type searches, BigQuery records have a fixed maximum size, and so are not ideal for document search engines. So, I would use BigQuery for queries such as: In my 200Gb log files, what are the 10 most common referral domains, and how often did I see them?
The Search API provides sorted search results over various types of document data (text, HTML, geopoint etc). Search API is really great for queries such as finding particular occurrences of documents that contain a particular string. In general, the Search API is great for document retrieval based on a query input.

Google Cloud Storage performance and full-text search

I'm developing an application that servers data to the users (like we all, right?). It's crucial for the data to be full-text searchable. For now, we store over 30 million records, some of them searchable only on one field but some of them searchable on a few fields.
I'm considering Google cloud solutions, but I'm very new to their storage technology. Googling the full-text search for their cloud brings results like: "it's experimental", "there are some workarounds" etc.
Could someone from hers/his own experience tell me what it possible and what is the performance of full-text search in Google Could Storage solutions?
Thx in advance,
trzewiczek
As far as I know, full-text search for appengine (and Google Cloud Storage) isn't available yet.
There is a blog post here about it:
http://googleappengine.blogspot.com/2011/10/app-engine-155-sdk-release.html
And a form here to register your interest, but you have to have a proper app to apply it to:
https://docs.google.com/spreadsheet/viewform?formkey=dEdWcnRJUXZ2VGR3YmVsT1Q1WVB2Smc6MQ&ndplr=1
To at least try and answer your question though, like, it's Google, so it's gonna be fast isn't it? They have been holding it back for a long time, presumably because it's not up to the task yet, but hopefully that will turn out to be a good thing in the long run.

Resources