What is the level of effort to integrate a later version of Solr (6+) with Crafter CMS (2.5+)?
My approach would be to get the default solrconfig.xml for the old and new Solr versions. Compare them, in particular the parts that the CMS seems to be using.
Solrconfig.xml can be over 2000 lines, so comparison can seem overwhelming, but it is not so bad if you focus on the relevant parts. You may find that nothing much changed which affects you. Sorry for being vague here.
If you find yourself in schema problems, then your easy way out is to remove schema.xml and enable managed schema mode. That can simplify configuration immensely.
Have a look at the great new Suggester and Highlighter features. But using them will of course increase your 'level of effort'. HTH -- Rick
Related
With Hybris commerce 6.7 which does have Solr 7.7, I'm finding it very difficult to configure Solr appropriately to meet business expectation while showing them "Did you mean" suggestion. I searched many articles regarding this and found many configuration parameters. Based on all those, with meaningful changes, I was expecting to have any working for me. Unfortunately, I'm still in search of that particular configuration or approach that retailers like Flipkart or Amazon is handling it. Below are the points that troubled me a lot.
To my knowledge Spellcheck works per word from entire search phrase. If user searches with single word but does have understandable spelling mistake, Solr is able to find the correct word easily. E.g. Telvison, mobbile etc.
If user searches multi-word (phrase), for some instances, Hybris Solr is not able to bring any suggestion. Sometimes, it shows suggestions with no real-world existence. E.g. If you misspelled aple watch, it gives suggestion apple water. For bode speaker, it suggests body speaker. For water heatre, it suggests skywater theatre. For samsung note 10 lite, it suggests samsung note 10 litre. For red apple wath, red apple with is getting suggested. For red apple watch, it shows led apple watch. And there are many. Isn't it ridiculous?
Tried adding WordBreakSolrSpellChecker dictionary with existing DirectSolrSpellChecker, it didn't impact the suggestion. I doubt if Hybris does allow this.
I also tried FileBasedSpellChecker dictionary to maintain a separate text file, but it seems like Hybris does have hard dependency that doesn't allow such changes.
Changed the dictionary to IndexBasedSpellChecker, but it threw exception in Solr admin console.
After playing with collate parameters, I figured out that Solr is giving me suggestions that don't have any direct search result (product). FYI, I used phrase search (not freetext) in my search implementation.
There are many parameters that standalone Solr does offer to us. I studied and implemented those, but I remained helpless although conceptually those should work.
Can anyone please guide me how I should proceed? If you want, I can share my Solr configuration.
I saw there was a article in the Apache wiki on OpenNLP for Solr.
Is it valid for current solr version 5.3.1?
No, if you have a look at LUCENE-2899, you'll see that the code discussed was never added to trunk. You'll have to download/patch/update the code yourself if you're going to have it native to Solr.
It's probably a better idea to do all the NLP stuff outside of Solr, then index the result in a form suited for the task you're trying to solve.
Yes. It's better to keep it outside.
Here is a small project I tried.
https://github.com/john77eipe/DeepQA
ElasticSearch has percolator for prospective search. Does SOLR have a similar feature where you define your query upfront? If not, is there an effective way of implementing this myself on top of the existing SOLR features?
besides what BunkerMentality said, it is not hard to build your own percolator, what you need:
Are the queries you want to run easy to model on Lucene only syntax? if so you are good, if not, you need to convert them to Lucene only. Built them, and keep them in memory as Lucene queries
When a doc arrives:
build a MemoryIndex containing only that single doc
run all your queries on the index
I have done this for a system ingesting millions docs a day and it worked fine.
It's listed as an open new feature, SOLR-4587, on Solr JIRA but it doesn't seem like any work has started on it yet.
There is a link in the comments there to a separate project called Luwak that seems to implement some features similar to percolator.
If it is still relevant, you can use this
It's SOLR Update Processor that based on Luwak
I am trying to optimize solr.
The default solrConfig that comes with solr>collection1 has a lot of libs included I dont really need. Perhaps if someone could help we identifying the purpose. (I only import from DIH):
Please tell me whats in these:
contrib/extraction/lib
solr-cell-
contrib/clustering/lib
solr-clustering-
contrib/langid/lib/
solr-langid
contrib/extraction/lib
solr-cell-*
These are Solr Cell Libraries which integrates with Tika and helps you Index Rich documents e.g. Microsoft Word, Excel etc.
contrib/clustering/lib
solr-clustering-
Solr clustering is for the Clustering support integrated with Carrot.
Clustering would help you group documents, topic, entity extraction and much more.
contrib/langid/lib/
solr-langid
Solr Language Id for the Language detection. It adds the ability to detect the language of a document before indexing and then make appropriate decisions about analysis, etc.
Just exclude the jars if you are not using any of the above features and be sure you remove the mappings from the Solr configuration files as well.
Are there any major differences between Solr 3.6 and Solr 4.0 other than new features? Am I safe using my existing queries (those that work in Solr 3.6) inside of Solr 4.0?
Are there any major differences between Solr 3.6 and Solr 4.0 other
than new features?
I find this question weird, least to say. Bug fixes and new features are the whole point of releases!
You can look at the full changelog of the Solr release which is a available here. Don't forget that Solr and Lucene are released in unison so you also need to look for relevant changes in both projects.
Am I safe using my existing queries (those that work in Solr 3.6)
inside of Solr 4.0?
Queries should be fine, but indices - probably not. Quoting javanna from another SO post:
The index format has changed, but Solr will take care of upgrading the
index. That happens automatically once you start Solr with your old
index. But after that the index cannot be read anymore by a previous
Solr/lucene version.
Ideally they should work.
You can probably check the Changes.txt which would give an idea of all the new features, Changes, Bug fixes, Optimization done.
If any things breaks, you can always refer to the Changes to check if any related has been changed.