I am trying to optimize solr.
The default solrConfig that comes with solr>collection1 has a lot of libs included I dont really need. Perhaps if someone could help we identifying the purpose. (I only import from DIH):
Please tell me whats in these:
contrib/extraction/lib
solr-cell-
contrib/clustering/lib
solr-clustering-
contrib/langid/lib/
solr-langid
contrib/extraction/lib
solr-cell-*
These are Solr Cell Libraries which integrates with Tika and helps you Index Rich documents e.g. Microsoft Word, Excel etc.
contrib/clustering/lib
solr-clustering-
Solr clustering is for the Clustering support integrated with Carrot.
Clustering would help you group documents, topic, entity extraction and much more.
contrib/langid/lib/
solr-langid
Solr Language Id for the Language detection. It adds the ability to detect the language of a document before indexing and then make appropriate decisions about analysis, etc.
Just exclude the jars if you are not using any of the above features and be sure you remove the mappings from the Solr configuration files as well.
Related
What is the level of effort to integrate a later version of Solr (6+) with Crafter CMS (2.5+)?
My approach would be to get the default solrconfig.xml for the old and new Solr versions. Compare them, in particular the parts that the CMS seems to be using.
Solrconfig.xml can be over 2000 lines, so comparison can seem overwhelming, but it is not so bad if you focus on the relevant parts. You may find that nothing much changed which affects you. Sorry for being vague here.
If you find yourself in schema problems, then your easy way out is to remove schema.xml and enable managed schema mode. That can simplify configuration immensely.
Have a look at the great new Suggester and Highlighter features. But using them will of course increase your 'level of effort'. HTH -- Rick
Does anybody know how to have Lucene and Solr together in the same Sitecore Instalation?
Sitecore states that is possible here:
https://doc.sitecore.net/sitecore_experience_platform/setting_up__maintaining/search_and_indexing/indexing/using_solr_or_lucene
You can mix Lucene and Solr, and, for example, use Solr for xDB and
Lucene for content search at the same time. If an index is small, it
is much easier to manage as a Lucene index because there is little to
no overhead to set it up.
But there is no reference on how to configure it.
Any advise is welcome.
Cheers!
In words, your analytic indexes will be using SOLR and your content search indexes will be using Lucene.
To configure your analytic indexes to use SOLR, you can check the following documentation from Sitecore: https://doc.sitecore.net/sitecore_experience_platform/setting_up__maintaining/xdb/configuring_servers/configure_a_processing_server#_Solr_configuration
By default, Sitecore already configured Lucene to be used for Content Search. So, for this, there is no change required.
However, I am not sure that SOLR and Lucene can be used for Content Search or xDB at the same time because of its configuration. For example, the Content Search makes use of the index configuration master, web and core. If you decide to use SOLR for Content Search, you will need to disable the Lucene configuration file from the Include folder.
Thanks
ElasticSearch has percolator for prospective search. Does SOLR have a similar feature where you define your query upfront? If not, is there an effective way of implementing this myself on top of the existing SOLR features?
besides what BunkerMentality said, it is not hard to build your own percolator, what you need:
Are the queries you want to run easy to model on Lucene only syntax? if so you are good, if not, you need to convert them to Lucene only. Built them, and keep them in memory as Lucene queries
When a doc arrives:
build a MemoryIndex containing only that single doc
run all your queries on the index
I have done this for a system ingesting millions docs a day and it worked fine.
It's listed as an open new feature, SOLR-4587, on Solr JIRA but it doesn't seem like any work has started on it yet.
There is a link in the comments there to a separate project called Luwak that seems to implement some features similar to percolator.
If it is still relevant, you can use this
It's SOLR Update Processor that based on Luwak
In short, I need to search against my Riak buckets via SOLR. The only problem is, is that by default SOLR searches are case-sensitive. After some digging, I see that I need to write a custom SOLR text analyzer schema. Anyone have any good references for writing search analyzer schemas?
And finally, when installing a new schema for an index, is re-indexing all objects in a bucket necessary to show prior results in a search (using new schema)?
RTFM fail.... I swear though, getting to this page was not easy
http://docs.basho.com/riak/latest/dev/advanced/search-schema/#Defining-a-Schema
Do you know if I can get synonyms.txt files for all languages supported by SOLR ?
Thanks for your help.
Before we were using Verity that provide a dictionary of synonyms for each language supported but we want maybe to move to Solr/Lucene.
I know that we can provide a custom synonym list that is no what I want. I am looking for a way to have a default dictionary of synonyms for each language supported by Lucene.
There is no 'out of the box' synonym resource provided for all the languages.
At least for some, you have wordnet (which is free) - see the solr wiki for wordnet usage.
A list of synonyms for any language is going to be very specific to the use cases related to a given set of indexed items. For that reason it would not be practical to have any prebuilt language specific versions of these files. Even the synonyms.txt that comes with Solr distribution is only built out enough to show examples of how the synonyms can be constructed.