How to do FTS within Google Cloud Platform - google-app-engine

Does Google Cloud Platform have a product to do full-text search via an API with non-web data (such as json or xml documents)? This may seem like a pretty silly question, but the only options I have come across are:
Search inside of Google App Engine (only available for python2, not python3) -- https://cloud.google.com/appengine/training/fts_intro/.
Related to web search only: https://developers.google.com/custom-search/docs/tutorial/introduction
Using a managed Elasticsearch: https://console.cloud.google.com/marketplace/details/google/elasticsearch.
Cloud firestore explicitly states it doesn't offer that and suggests using Aloglia (and gives details on integrating): https://cloud.google.com/firestore/docs/solutions/search
Is there something I'm missing? I'm basically looking to index and search about a million documents in a sort of free-form type of search. Is this offered as a product from Google outside of App Engine? If so, how can I access it?

You have pretty much covered it there. There is currently no specific Google service for full-text search. As you mentioned, App Engine Search API is available for Python 2.7, which will stop being maintained after January 2020, and not Python 3.
There is one more option you could consider, which is using Lucene foe GAE. I found this blog where several possibilities are studied, perhaps could be an interesting reading for you.
To sum up, I would recommend ElasticSearch or Aloglia, but for the latter you need a Firebase project.

Related

Apache Solr: Can apache solr be used as a third part system for indexing and searching for documents from different websites?

I am working on implementing a research web application or portal that integrates different research portal or website using an open source platform called search kit. The web application will act as a central point of access to research publications on different research portals. To do this, I also need to implement a third party system that does the following:
Searches for documents based on user query on the other different research portals and presents or displays the results to the users on my web application.
Index the documents
Should be used by system administrators to configure the web application. Whereby system administrators can add,remove or modify the URL of the website Solr is pulling documents from
Displays the results to the user in one standard format.
My question is, can apache solr be used to implement the third party system? if not, what open source platform or way would you recommend I used to implement the third party system?
In general, Solr seems like a good fit here, but you might need some custom code (apart from configuration) here and there. To go through the points:
Querying is one of the main features of Solr, so this is definitely possible.
Indexing is handled by Solr.
There was a component for Solr called "Data Import Handler" that supported indexing from URLs (see the docs). However, this was removed from the main Solr distribution, and was moved to a separate package. This package doesn't seem to be actively maintained though, so you will probably run into some problems if you decide to use it. The alternative is to develop your document-pulling code yourself.
Solr can display the results in multiple formats, but it still might not support the exact format you would like it to be. In this case, you need to build your transformation based on the result from Solr.

Choose Lucene or Solr

We need to integrate a search engine in our plataform Catalog management software in Share point. The information is stored in multiple databases and a storage of files ( doc , ppt , pdf .....). Our dev platform is Asp.Net and we have done some pre-liminary work on Lucene, found it to be good. However, we just came to know of Solr.
We need to continue using lucene, but we need to defend her the solr.
Please any help is accepted.
And sorry for my english.
Lucene is a full-text search library used to provide search functionalities to an application. It can't be used as an application by itself. Solr is a complete search engine built around Lucene providing its search functionalities and others. Solr is a web application that can be used by itself without any development around it.
If you need a search engine to be called by your application I recommend you to use Solr.

How to integrate Solr with Web Application

After reading many Solr books and article all over on the net, now I have an idea of the power of this server.
But... how to integrate it in a real application? For example: a web site written in PHP, etc.
Right now, I understand that Solr produces XML, JSON etc results... so to integrate this in a web application, the "simple" work is to convert this information for render in a page or there are other technique to avoid this?
I'm my case, I have to develop a search engine to scan many documents and find result.
My idea was:
Use Solr to build an index and search documents
Use a web application to show the result
Looking on the net I haven't find anything that explains how to integrate Solr in a real application, all the reading are about "How to use Solr... with Solr..." Anything about a real integration.
Does someone have some useful resource how to integrate Solr in a real application, with some clean examples?
Edit: It looks like Apache maintains their own list of recommended
client APIs, and their recommended tool for PHP is Google's
library (though they refer to it as SolPHP). Given this, I imagine that this is the best place
to start.
A Solr library for the programming language you're using could save you some of the trouble in implementing the integration. For instance, if your site is written in PHP, you could try Google's Solr library for PHP.
I have done most of my Solr work in Java, so I have used SolrJ quite a bit. This is a well supported tool because it comes from Apache in parallel with the Solr product itself.
If you are doing work in any other languages, you are likely to find libraries available for them. The amount of time they save you may vary according to the quality of the library itself.
When I was using Solr in my project, only my application server (that is Tomcat) was communicating with Solr server. I wrote a class, which executes GET requests to Solr server based on input provided by end user. When Solr returns XML/JSON back to an application server you may parse it and process as every other bussiness data (render an *.html). So, summing up, Web Browser never communicates directly with Solr, all goes through an application server:
WebBrowser -> GET to application server -> GET to Solr server
show *.html <- parse XML/JSON, render *.html <- return XML/JSON

Google App Engine : GeoPtProperty query

I have this latlng = db.GeoPtProperty() in app engine.
Given a lat long, how do I query the first 10 closer latlng without using any third party library?
Is there any nice documentation for me to refer?
Thanks in advance.
Geospatial queries aren't supported natively by the datastore. There are userland implementations however, including geomodel, documented here.
I don't believe this is natively supported yet. However, there is a talk at Google I/O 2011 on App Engine full text search (emphasis mine):
"At last we are adding a full text
search service to App Engine. The
upcoming service will be built on top
of the very infrastructure used by
Google. In addition to full text
search queries we will also offer
numeric, geo, date search
capabilities, and much more. This
session will cover the basic full text
search API, briefly outline more
advanced features, and how full text
search ties to existing services such
as datastore."
Stay tuned...

How can one perform full text search in Google App Engine?

It's a simple question, but I haven't found the answer anywhere. Thoughts and input appreciated.
I'm using Django, too, for what it's worth. :)
Cheers.
The Search API is now available as experimental for Java and Python .
With Java GAE, you could use Compass, but that won't help with Django. For Python, Bill Katz offers one solution -- open source -- and these guys offer a Django-specific approach which, however, is free only for non-commercial applications (i.e. if your app makes money they want you to pay for their full-text search). I have no real-world experience with either of these solutions so I can't really give well-grounded recommendations, but from what one can see with just a little playing around they seem quite useful.
An overview of the Python App Engine searches that I am aware of:
Google did add a cut down search using SearchableModel although that has limitations (5000 indexed word limit, String property only not Text):
http://groups.google.com/group/google-appengine/browse_thread/thread/f64eacbd31629668/8dac5499bd58a6b7?lnk=gst&q=searchablemodel
Or as another posters have pointed out there are these options:
The Quick and simple text search:
http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-App-Engine
This product which has a fairly comprehensive free version and a more extensive commercial version:
http://gae-full-text-search.appspot.com/customers/download/
I've read that Google do have a project to bring full text search to App Engine although this is not scheduled to happen any time soon
I'd really like to see a comparison of the various searching frameworks and see how they stack up to each other. Does anyone know of any report like this?
Edit:
Google Search API now available (although still experimental)
For now, the real answer is that there is no real full-text search on Google App Engine. The solutions provided by the other answers here are fine for toy data sets, but do not scale to anything more than O(10000) documents or so. Google will have to provide search as an infrastructural feature of GAE. See the feature request for (mostly superfluous) discussion.
# The Quick and simple text search:
http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-App-Engine
this solution did not work for me - and looking at the limitations below, it is unlikely to be useful for real use cases.
It uses StringListProperty to store phrases which has a limitation of 500 characters.
It does not work with the standard query filters.
Issue 217 Bill Katz released a package to deal with and http://gae-full-text-search.appspot.com/ is available alternatively, levensthein is a another match measure
You should be able to adapt Whoosh! to write in the datastore instead of on disk. It's a pure python full-text search engine. It's not as fast or full-featured as Lucene, but it should run on GAE without too many modifications.

Resources