Phonetic filter factory for Hindi - solr

I am working with Apache solr ,I am trying to use phonetic filter factory , I have tried all the encoders that are available with solr.PhoneticFilterFactory but none of them is supporting indian languages . Is there any other Filter/Method available so that i can get phonetic representation for indian languages e.g Hindi,tamil,Bengali etc
If not then how we can modify existing filters to support these languages.

Have you tried the new Beider Morse Filter Factory, which was just added in version 3.6 and is (alas) not yet well-documented?
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.BeiderMorseFilterFactory
It was developed for phonetic searching of Central and Eastern European surnames, but maybe it would work for other languages too. I have personally found that it works much better than Soundex or the other older soundalike methods.

Related

Attribute Comparators in Vespa.ai

Does Vespa support comparators for string matching like Levenshtein, Jaro–Winkler, Soundex etc? Is there any way we can implement them as plugins as some are available in Elasticsearch? What are the approaches to do this type of searches?
The match modes supported by Vespa is documented here https://docs.vespa.ai/documentation/reference/schema-reference.html#match plus regular expression for attribute fields https://docs.vespa.ai/documentation/reference/query-language-reference.html#matches
None of the mentioned string matching/ranking algorithms are supported out of the box. Both edit distance variants sounds more like a text ranking feature which should be easy to implement. (Open a github issue at https://github.com/vespa-engine/vespa/issues)
The matching in Vespa happens in a c++ component so no plugin support there yet.
You can deploy a plugin in the container which is written in Java by deploying a custom searcher (https://docs.vespa.ai/documentation/searcher-development.html). Then you can work on the top k hits, using e.g regular expression or n-gram matching to retrieve candidate documents. The soundex algorithm can be implemented accurately using a searcher and a document processor.

AngularJS - spell checker in german

I have found a spell checker in english:
spellchecker english
Is there also one in german?
Try JavaScript SpellCheck.
It supports a wide variety of languages including German.
Most of the application i developed, I used the key-value paired JS file for all required languages. Those translations i am received from business or i translated English to XXX language using google translator & verified those from the business. Once you have it then it is very easy to apply it into application using angular translation service.

Phonetic search with Solr, for Brazilian Portuguese

we are implementing Solr as the new internal search engine for our website.
Most features are running just fine, others are in the adjusting and calibration phase.
But there is one feature that I'm not finding any good documentation over the web. So here it goes:
how can I implement phonetic search and suggest with Solr, for brazilian portuguese language?
I was able already to create an index, using the official stemming tokenizer
http://docs.lucidworks.com/display/solr/Language+Analysis#LanguageAnalysis-BrazilianPortuguese
But the match against uses parsers adapted to understand everything as english. That is where the problem lies.
Tutorial, documentation, how to or reply are welcome.
you should use SynonymFilterFactory : solr documentation
#example of synonym definition for brazilian's living people kkk :)
copa=>futebol
football=> futebol
brasil, brazil => brasil
Caution : do use this filter on index time . It will have no effect on query time saddly :(

full text search engine in hebrew

I want to try and use Elasticsearch as a full text search engine for a website in Hebrew.
I wanted to know if this Elasticsearch can produce good results for Hebrew and if there are any big websites in Israel that use it as their search engine.
If not ElasticSearch - maybe Apache Solr?
By the way - I'm using Ruby, but can work with Java as well.
Thanks!
Have a look at the ICU plugin for Elasticsearch.
David.
Solr seems to support Hebrew, see links to Language Analysers below:
Solr language analysis in Hebrew
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
Although I am not certain what the options for ElasticSearch are.
Look at hebmorph - http://www.code972.com/blog/hebmorph/
It's a lucene plugin and we've been working with it in http://alpha.gov.il and http://www.guidestar.org.il/
Take a look at Algolia
By design the Algolia engine is language agnostic. Out of the box, it supports all languages / alphabets, including symbol based languages such as Chinese, Japanese and Korean.
Additionally, Algolia handles multi-languages on the same website/app, meaning some users could search in French, and some in English, using the same Algolia account on the background.
The purpose of this guide is to explain how to organize your indices to enable multi-language search.
Taken from here

Does Solr have an equivalent to CompassQueryBuilder?

I am rewriting our company's search functionality to use Solr instead of Compass. Our old code is using CompassQueryBuilder.CompassQueryStringBuilder to build a query out of a list of keywords. The keywords may have spaces in them: for example: "john smith", "tom jones".
Is there an existing facility I can use in Solr to replicate this functionality?
The closest thing I know for SolrJ is the solrj-criteria project. It seems to be currently unmaintained though.
Solr offers a wide variety of querying and indexing options. So fields that contain keywords with spaces in it, can be made possible by defining a custom type in the configuration file (see here). Queries with spaced keywords in it can be made possible by specifying a custom QueryParser. (see here)
Solr itself doesn't offer a QueryStringBuilder in an API. Actually, Solr itself doesn't offer any API classes at all, since all interaction is done by posting messages over Http. There are client libraries for Java, .NET and PHP etc. In the SolrNet api there exists a SolrMultipleCriteriaQuery, which is quite similar to the CompassQueryStringBuilder.

Resources