Which solution for a detailed product search? - solr

I need to build some kind of product search and i'm not sure, which way I should go.
Requirements:
Proximity Search
Custom Ranking
Auto-correct-Suggestions, like in Google when you type "Winipedia", it suggest "Wikipedia".
Indexing PDFs as field value of a search entity
German Language Support for Autocorrect-Suggestions
Auto-complete support
I tried it with AWS CloudSearch, but their support sucks if you don't pay extra for support and they don't support German yet, nor Auto-complete.
Is there any search solution with all the functions I need? Elasticsearch looks good, but I can't find any detailed feature list about it.
Thanks in advance for any help!
Regards
Nils

Both Solr and ElasticSearch have the features you need, choosing between them is only a matter of taste. :-)

There are 3 actively marketed solutions in the market:
1. Swiftype
2. Easyask
3. Unbxd
All these solutions handle it pretty well and better than Lucene and Solr (Tried and tested). Go for Lucene / Slor if you have the time and money to improve it on your own... or just buy a off the shelf solution.

Related

How useful is Lucene/Solr in database search?

I am new in development and I need your advice.
Our student team is going to develop an application for online restaurant booking, where also will be search tool (restaurant and dishes search).
We want to use modern search tool like Lucene, but we are not sure if it is what we really need.
Due to knowledge information, this is more for text search with different kinds of indexes and so on, while our app will make search in database. BUT, if we want to add new features in future, I guess we need good search engine background today.
So, let me know if Lucene is able to do "select" operations or something like it, or this technology is just for text searches?
Sedond question, what can you advise in realisation of this feature? Where to start with?
Thank you in advance.
It all depends. You usually don't start with Lucene and Solr, you use it to attain a goal or implement a specific behavior you need. Usually Solr is your secondary storage, built from your primary database - i.e. you're inserting data into Solr to solve a specific need, for example proper full text search with relevancy scoring.
If you're just starting up, go with the technology you know - i.e. usually a regular RDBMS. You can then attach Solr if you need those features that they're really good at, and wait with introducing new technology until it's necessary. The need first, then the technology. Maybe Lucene/Solr isn't the right technology for what you end up needing when you get to that point.
One of the main tenants of modern development is "YAGNI" - You Ain't Gonna Need It. You implement features when you need them, not for some imagined behavior that may or may not show up down the road.

Misspelling & synonym support for Azure Search?

I've seen threads discussing both of these topics:
Does Azure Search handle synonyms
Fuzzy Search in the Search API
I see that Liam Cavanagh from the Azure Search team seems to be the guy who's answered queries on these threads.
Liam, are you able to confirm the following yet please:
When full synonym support will be added to Azure Search
Do you definitely plan to support synonyms with Azure Search, or is it possible that you will recommend that customers use the Bing Synonyms product instead?
Do you have any plans to go beyond fuzzy logic and offer more advanced support for misspellings (i.e. multiple letters missing or in the wrong order, which stemming won't cover)?
Many thanks,
Ali
I don't know why you got a negative vote as I think these are really good questions. Let me answer your questions as best as I can:
You are correct that we do not have implemented "full synonym support", and this is one of the next most highly requested features, so it is definitely something that we have on our near term list although I am sorry that I can not provide a date yet. If you have time, please cast your vote for this here: http://feedback.azure.com/forums/263029-azure-search/suggestions/8410635-support-custom-dictionary In the meantime, there are "hacks" that can be done which are far from perfect, but can help get part of the way. One example is to add a Collection field and then populate it with the relevant synonyms for each document.
I can not say that this is a "definite" feature, but given how often we hear this request, hopefully I have given you an insight into the likelihood that it will be implemented.
I am curious if you have tried our brand new Lucene Query Expression support (https://msdn.microsoft.com/en-us/library/mt589323.aspx)? There is some really great capabilities for fuzzy search and also capabilities to do things like RegEx searches, etc. This is pretty awesome (IMO).
I hope this helps, and I am sorry that I am not yet able to give more definitive dates on some of these questions.
Liam

can GSA use data indexed by Apache Solr for search as a combined solution

It is observed that google does not provide good indexing through its enterprise
search solution Google Search Appliance . But Apache solr has a good indexing capability. Can we use apache solr to index documents and then those documents be
searched through GSA server . So that we can get best of the both world. Kindly give your thoughts ??
Can you please provide more details on why you think the GSA "does not provide good indexing"?
The GSA is generally recognised as being the best or at least one of the best when it comes to result relevancy. When it comes to non-web content, Google supply multiple connectors to allow you to index this content in the GSA and if you have a content source that is neither web based or covered by one of the Google connectors it is not difficult to write your own.
So I'm not sure why you think the indexing is not good, it would be really helpful if you could elaborate.
Mohan is incorrect when he says that you cannot serve Solr content via a GSA, you certainly can do this. What you will need to do is create a onebox module so that you can federate Solr results in realtime and they will be presented to the right of the main GSA results.
What is your data source?
If it is a website crawl,to my little knowledge GSA provides sophisticated crawling/indexing capability for websites than Solr.
Because Solr needs external toolkit such as Tika or Nutch for crawling web resources. On the other hand GSA has its own crawler which makes crawling simple and effective.
Regarding your question on indexing through Solr and serving through GSA,
it is possible through onebox module.(Refer BigMikeW's answer)
If you can provide some information about your data sources, it might help people to suggest the best solution to increase indexing capability in GSA.

Is it advisable to use Lucene for this?

I have a huge XML file, about 2GB in size, containing Resumes. There are thousands of resumes in this file, tagged properly. Right now I am using XPATH to query it. So is it advisable to use Lucene for the same instead of XPATH?
Depends upon what your requirements are. If you need full-text searching and all other great features of a full-blown search engine, Lucene is the way to go. I would recommend Solr which builds on top of lucene and provides a much better API and abstraction.
Like everything else technology related, it depends.
What Lucene gives you that you're not getting with XPath is the power of a full-text engine that supports among other things ranking and the ability to phrase queries, wildcard queries etc.
Based on your use-case I would say that at full-text search engine makes sense. That's not to say that vanilla Lucene is the best way to go (there are for example other alternatives that build on Lucene).
2GB seems to be pretty less for which I would contruct my own inverted index (a minimal one) :) However no problem in using Lucene/Solr though. Go ahead. It will help you once your records starts doubling. However at this scale (2GB) or even much larger many real life stuff is working on databases full text searches using SQL like keyword.

Differences between sunspot and tire gems

Currently I am using thinking sphinx for search. Now I'm considering using sunspot or tire because they automatically index new content.
Are there any performance differences between the two? Is there anything else I should be concerned with?
Obviously the first difference is that you want to decide which search engine you think is best for your purposes: SOLR or Elasticsearch. We're using SOLR via Sunspot right now, but we're thinking seriously about moving to Elasticsearch because it feels like a better match for the sorts of web app functionality we want. It was incredibly easy to set up Tire, install the attachments plugin, and get search operating against data both in the database and in PDF attachments, with highlighting (now working thanks to another answer here on SO). Also, from a development/debugging point of view being able to use curl to test queries and see results is just great.
From the point of view of coding in a Rails app, you're right that both Sunspot and Tire are very similar. They both use the idea of a searchable/mapping block that defines what fields to index and how, and then performing a search is quite similar. As far as performance goes, I might give a bit of advantage to Tire, partly because the way it paginates and indexes in bulk is pretty slick (via the rake tire:import task). The ability in tire to control the indexing contents via to_json is very flexible as well.
Ultimately I think probably Sunspot and Tire are close enough that the choice between SOLR vs Elasticsearch is where you'll really end up making your decision.

Resources