In the documentation of features, it is said that search engine like keyword highlighting is supported in vespa. I couldn't find any example on how to implement it.
You control this in the search definition document on a per field basis
See
https://docs.vespa.ai/documentation/reference/search-definitions-reference.html#bolding
You can also instruct it do make a dynamic small snippet with the summary:dynamic option
Related
I'm trying to add multiple suggesters to an Azure Search index, but receive this error response:
An index cannot have more than one suggester with searchMode='analyzingInfixMatching'
The property on the index is called suggesters and is of type array.
The documentation states:
The only mode currently supported is analyzingInfixMatching
Is the api defined to support future capabilities? Or am I overlooking an option that will enable me to add multiple suggesters?
And to understand why:
I have fields in my index for different languages. I would like a suggester only to include the field for the language the current user has specified.
With the current implementation I can only provide suggestions or autocomplete based on all available languages.
I work on the Azure Cognitive Search team. As you guessed, the schema is designed to be adaptive to future changes but at the moment only one suggester is supported - analyzingFixMatching.
If I understand correctly, you want to apply suggestions selectively only on fields specified by user at query time. You can achieve that through searchFields property of suggestions API. Define a suggester with all the fields that can potentially be specified by users and then limit the required ones using the above property.
More details here - https://learn.microsoft.com/rest/api/searchservice/suggestions
Why does Google Structured Data Testing Tool show an error in this case?
How to resolve it?
Google’s Structured Data Testing Tool is not a general structured data validator. It only recognizes terms from vocabularies which Google makes use of (e.g., Schema.org and the deprecated Data-Vocabulary.org).
You are using the GS1 vocabulary, which doesn’t seem to be one of the vocabularies supported by Google.
All terms from other vocabularies produce this error. It’s perfectly fine to use such terms, so simply ignore these errors.
You might try it on the Structured Data Linter, which is not so tightly bound to schema.org or datavocabulary. IIRC, it doesn't have built-in knowledge of GS1, but this could be added fairly reasonably. Issues and pull-requests at http://github.com/structured-data/linter.
I'm implementing auto-suggestion in a web page (ASP.NET MVC) with solr and have understood that there are a number of ways to do this, including:
jQuery Autocomplete, Suggester, facets or NGramFilterFactory.
Which one is the fastest one to use for auto-suggestion?
Any good information about this?
You should take a good look at 'AJAX Solr' at https://github.com/evolvingweb/ajax-solr .
AJAX Solr has autocomplete widget among other things. Demo site - http://evolvingweb.github.io/ajax-solr/examples/reuters/index.html .
Here's my shot at addressing your need, with this disclaimer:
'Fastest' is a very vague term, and extends to a broader spectrum, i.e browser used, page weight, network etc. These need to be optimized outside the search implementation, if need be.
I would go for the straight-forward implementation and then optimize it based on performance stats
Ok, now to the implementation, at a high level:
1) Create a Solr index with a field having an NGramTokenizerFactory tokenizer.
- to reduce chatter, keep minLength of NGram to be 2, and fire autosuggest with minLength =2
2) Depending on technology used, you can either route search requests through your application, or hit Solr directly. Hitting Solr directly could be faster (Ref AjaxSolr as mentioned already).
3) Use something like Jquery-ui to have an autosuggest implemtation, backed with ajax requsts to Solr.
Here are couple of reference implementations:
http://isotope11.com/blog/autocomplete-with-solr-and-jquery-ui
https://gist.github.com/pduey/2648514
Note that there are similar implementation that work well for live sites, so I would be tempted to try this out and see if there is still a bottleneck, and not tempted to do any premature optimization.
'AJAX Solr' has limitations with respect to autosuggestions as it provides only word level suggestions. Internally it uses faceting to generate them.
But SOLR provides different suggesters which we can leverage to generate intelligent autosuggestions(words/phrase)
Checkout this blog post to know more.
http://lucidworks.com/blog/solr-suggester/
For implementation, you can use combination of suggesters(FST + Analyzing Infix) and jQuery autocomplete.
I have an Apache Solr core where i need to pull out the popular terms out of it, i am already aware of luke, facets, and Apache Solr stopwords but i am not getting what i want, for example, when i try to use luke to get the popular terms and after applying the stopwords on the result set i get a bunch of words like:
http, img, que ...etc
While what i really want is:
Obama, Metallica, Samsung ...etc
Is there any better way to implement this in Solr?, am i missing something that should be used to do this?
Thank You
Finding relevant words from a text is not easy. The first thing I would have a deeper look at is Natural Language Processing (NLP) with Solr. The article in Solr's wiki is a starting point for this. Reading the page you will stumble over the Full Example which extracts nouns and verbs, probably that already helps you.
During the process of getting this running you will need to install additional software (Apache's OpenNLP project) so after reading in Solr's wiki that project's home page maybe the next step.
To get a feeling what is possible with that you should have a look on the demonstration of the searchbox guy. There you can paste a sample text and get relevant words and terms extracted from it.
There are several tutorials out there you may have a look at for further reading.
If you went down the path and the results are not as expected or not as good as required, you may go down that road even further and start thinking about text mining with Apache Mahout. There are again several tutorials out there to cross it with Solr.
In any case you should then search Stackoverflow or the web for tutorials and How-Tos you will certainly need.
Update about arabic
If you are going to use OpenNLP for not supported languages, which Arabic unfortunately is out of the box as of version 1.5, you will need to train OpenNLP for the language. The reference about it is found on the developer docs of OpenNLP. Probably there is already something out there from the arabic community, but my arabic google-fu is not that good.
Should you decide to do the work and train it for the arabic language, why not share your traning with the project?
Update about integration in Solr/Lucene
There is work going on to integrate it as a module. In my humble opinion this is as far as it will and should get. If you compare this problem field to stemming stemming appears to be rather easy. But even stemming got complex when supporting different languages. Analysing a language to the level that you can extract nouns, verbs and so forth is so complex that a whole project evolved around it.
Having a module/contrib at hand, which you could simply copy to solr_home/lib would already be very handy. So there would be no need to run a different installer an so forth.
Well , this is a bit open ended.
First you will need to facet and find "popular terms" out of your index, then add all the non useful items such as http , img , time , what, when etc to your stop word list and re-index to get cream of the data you care about. I do not think there is an easier way of knowing popular names unless you can bounce your data against a custom dictionary of nouns during indexing (that is an option by the way)- You can choose to index only names by having a custom token filter (look how stopword filter works) and have your own nouns.txt file to go with your own nouns filter, in the case you allow only words in your dictionary to into index, and this approach is only possible if you have finite known list of nouns.
Maybe I am missing something but is there any way to use the new text search features as described in the 2011 presentation http://www.youtube.com/watch?v=7B7FyU9wW8Y (approx. 30min mark) with Objectify, Entities, and Java? I realize it is an experimental release but the text search features that are present don't seem to cover the full extent they discussed in the presentation. I don't want to have to write my own code to manage the creation, updates to documents. But I don't currently see another way??
Currently, you can't use Full Text Search to search through entities in Datastore; you'll need to create search documents in a search index to use the Full Text Search API, as described in these docs.