Misspelling & synonym support for Azure Search? - azure-cognitive-search

I've seen threads discussing both of these topics:
Does Azure Search handle synonyms
Fuzzy Search in the Search API
I see that Liam Cavanagh from the Azure Search team seems to be the guy who's answered queries on these threads.
Liam, are you able to confirm the following yet please:
When full synonym support will be added to Azure Search
Do you definitely plan to support synonyms with Azure Search, or is it possible that you will recommend that customers use the Bing Synonyms product instead?
Do you have any plans to go beyond fuzzy logic and offer more advanced support for misspellings (i.e. multiple letters missing or in the wrong order, which stemming won't cover)?
Many thanks,
Ali

I don't know why you got a negative vote as I think these are really good questions. Let me answer your questions as best as I can:
You are correct that we do not have implemented "full synonym support", and this is one of the next most highly requested features, so it is definitely something that we have on our near term list although I am sorry that I can not provide a date yet. If you have time, please cast your vote for this here: http://feedback.azure.com/forums/263029-azure-search/suggestions/8410635-support-custom-dictionary In the meantime, there are "hacks" that can be done which are far from perfect, but can help get part of the way. One example is to add a Collection field and then populate it with the relevant synonyms for each document.
I can not say that this is a "definite" feature, but given how often we hear this request, hopefully I have given you an insight into the likelihood that it will be implemented.
I am curious if you have tried our brand new Lucene Query Expression support (https://msdn.microsoft.com/en-us/library/mt589323.aspx)? There is some really great capabilities for fuzzy search and also capabilities to do things like RegEx searches, etc. This is pretty awesome (IMO).
I hope this helps, and I am sorry that I am not yet able to give more definitive dates on some of these questions.
Liam

Related

Is there a way to access Apache Solr (8.3.1) search queries without crawling the logs?

For statistic purposes I have to save and analyse all search queries made to a server running Solr (version 8.3.1). Maybe it's just because I haven't worked with Solr until today, but I couldn't find a simpler way to access these queries except crawling the logs.
I've only found one article to help me, in which the following is stated:
I think that solr by him self doesn't store the queries (correct me if I'm wrong, about this) but you can accomplish what you want by processing the solr log (its the only way I think).
(source: https://lucene.472066.n3.nabble.com/is-it-possible-to-save-the-search-query-td4018925.html)
Is there any more convenient way to do this?
I actually found a good way to achieve this in another SO-Question. Well, at least kind of.
Note: It is only useful if you have enough resources on the same server or another server to properly handle a second Solr-Core.
Link to original answer
That SO-Question is about Elastic-Search but the methodology of it can also be applied to this case with a second Solr-Core that indexes the queries made. (One can also add additional fields like when it was last searched, total search count, ...)
The functionalities of a search auto-complete are also achievable with this solution.
In short:
The basic idea is to use a second Solr instance to provide the means necessary for quickly saving the queries (instead of a DB for instance).
Remark: I'm not going to accept this as the best answer because it is a rather special solution to the question I originally made. But I nonetheless felt it could be useful for any programmer trying to achieve this while also thinking about search auto-completion.

How useful is Lucene/Solr in database search?

I am new in development and I need your advice.
Our student team is going to develop an application for online restaurant booking, where also will be search tool (restaurant and dishes search).
We want to use modern search tool like Lucene, but we are not sure if it is what we really need.
Due to knowledge information, this is more for text search with different kinds of indexes and so on, while our app will make search in database. BUT, if we want to add new features in future, I guess we need good search engine background today.
So, let me know if Lucene is able to do "select" operations or something like it, or this technology is just for text searches?
Sedond question, what can you advise in realisation of this feature? Where to start with?
Thank you in advance.
It all depends. You usually don't start with Lucene and Solr, you use it to attain a goal or implement a specific behavior you need. Usually Solr is your secondary storage, built from your primary database - i.e. you're inserting data into Solr to solve a specific need, for example proper full text search with relevancy scoring.
If you're just starting up, go with the technology you know - i.e. usually a regular RDBMS. You can then attach Solr if you need those features that they're really good at, and wait with introducing new technology until it's necessary. The need first, then the technology. Maybe Lucene/Solr isn't the right technology for what you end up needing when you get to that point.
One of the main tenants of modern development is "YAGNI" - You Ain't Gonna Need It. You implement features when you need them, not for some imagined behavior that may or may not show up down the road.

Which solution for a detailed product search?

I need to build some kind of product search and i'm not sure, which way I should go.
Requirements:
Proximity Search
Custom Ranking
Auto-correct-Suggestions, like in Google when you type "Winipedia", it suggest "Wikipedia".
Indexing PDFs as field value of a search entity
German Language Support for Autocorrect-Suggestions
Auto-complete support
I tried it with AWS CloudSearch, but their support sucks if you don't pay extra for support and they don't support German yet, nor Auto-complete.
Is there any search solution with all the functions I need? Elasticsearch looks good, but I can't find any detailed feature list about it.
Thanks in advance for any help!
Regards
Nils
Both Solr and ElasticSearch have the features you need, choosing between them is only a matter of taste. :-)
There are 3 actively marketed solutions in the market:
1. Swiftype
2. Easyask
3. Unbxd
All these solutions handle it pretty well and better than Lucene and Solr (Tried and tested). Go for Lucene / Slor if you have the time and money to improve it on your own... or just buy a off the shelf solution.

how to implement full text search in database

I understand that full text indexing and search for a database can be enabled by a lot of pre-packaged products. However, just out of academical curiosity, I wonder how are those full text indexes actually implemented. I have tried to google for results with little answer. Please any feedback would be much appreciated.
Full text searches are supported by quite a few database engines these days as a core feature.
As for implementation I think your best bet is to check out postgres full text searches, as you can
find a lot of material on how it is implemented
actually change and play with the parsers (for example optimize for certain domain)
There are further details and concept explained on wikipedia:
full text indexes, and you can also check out
open source and free full text search engines as normally you will find supporting documentation explaining inner workings of those too (I have heard good things about Lucene/Solr from this list)
Probably by creating dictionaries of "words" and maybe a bit of lexical analysis. (Note that fulltext searches whole words and not parts of words, so indexing may be constrained to that.)

automatic documents tagging related

I started working on a project in which i must tag documents with keywords, and it is really hard and time consuming if you do it manually (specially if you have thousands of documents). So I am planning to automatize the process (knowing that the result would not perfect but at least it gives you some suggested tags ).
In the latest firefox version they implemented a system like this (when you bookmark a page, it suggests you some tags).
yahoo term extraction service is also a great example
So if any body can help me get around this problem I would really appreciate the help. Or if someone know about the firefox tagging system a little bit of help would be great.
Would a statistical algorithm work? Something Bayesian perhaps? I know they're used in spam filtering, maybe you can adapt a Bayes filter to suit your needs.
At the very least, you could suggest words that are used frequently but are not common words in English (he, she, I, and, it, then, or, etc...)

Resources