Custom stat aggregation and sorting function in solr - solr

Please provide suggestion/example to create a custom stat[Aggregate] function in the Solr.
Every document in Solr is having 30 String, 10 int and 1 binary[hashset to store unique users] fields.
I want to add a new stat function (UniqueUsers(fieldName) like add/avg function already available in Solr) to find the unique across searched Solr records.
While googling, i found following Issue ticket SOLR-5302 ,which is already resolved ,but i dint find any example to implement it.
https://issues.apache.org/jira/browse/SOLR-5302

Solr already has a stats component that does aggregation and combines it with facets.
http://wiki.apache.org/solr/StatsComponent
https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
If this doesn't do exactly what you need, you can copy that source code file (and any that it references which need changes) to your own custom component and compile a new project against the Solr jars that you can find in the dist/ directory of the download. I don't have precise instructions for doing this -- the person who sets that up must have extensive knowledge of how to build Java development projects. Then you must inform Solr about the new Component and use it in your request handler, by modifying solrconfig.xml.
The trunk version of the Solr source code already includes a new and more capable AnalyticsComponent - SOLR-5302. Because it is missing a little bit of functionality already present in the StatsComponent, I don't think it is available in a released version of Solr yet. You're welcome to take the patches and do the manual work that may be required so they work with a 4.x version.

Related

Solr - Migrate Documents from one Collection to another existing one

I need to move all Solr Documents from one collection to another (already existing collection) - there are 500,000 documents.
I have tried the solr migrate but cannot get the routing key correct. I have tried:
curl 'http://localhost:8983/solr/admin/collections?action=MIGRATE&collection=oldCollection&target.collection=newCollection&split.key=!'
I have solr 4.10.3 installed in a cloudera installation.
Copy your existing oldCollection, and rename the as newCollection,
After that you may need to update some config files for the same.
Or create a new one using the below api
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1
The answer and the question are quite old, starting from 8.1 solr version, there is a feature specific for this purpose which is the reindexcollection api which can directly be used to reindex docs from source to a target collection with a lot of configurable options. Here is the link to the official doc : https://lucene.apache.org/solr/guide/8_1/collections-api.html#reindexcollection

Does SOLR support percolation

ElasticSearch has percolator for prospective search. Does SOLR have a similar feature where you define your query upfront? If not, is there an effective way of implementing this myself on top of the existing SOLR features?
besides what BunkerMentality said, it is not hard to build your own percolator, what you need:
Are the queries you want to run easy to model on Lucene only syntax? if so you are good, if not, you need to convert them to Lucene only. Built them, and keep them in memory as Lucene queries
When a doc arrives:
build a MemoryIndex containing only that single doc
run all your queries on the index
I have done this for a system ingesting millions docs a day and it worked fine.
It's listed as an open new feature, SOLR-4587, on Solr JIRA but it doesn't seem like any work has started on it yet.
There is a link in the comments there to a separate project called Luwak that seems to implement some features similar to percolator.
If it is still relevant, you can use this
It's SOLR Update Processor that based on Luwak

Solr luceneMatchVersion syntax

I have Solr 4.10 and I have collection on it with solorconfig.xml has the value for <luceneMatchVersion> as follows:
<luceneMatchVersion>4.7</luceneMatchVersion>
Is this correct? I saw other examples that has values such as LUCENE_35 What I need to know also, how could I express LUCENE_xx from my current Solr version?
You should use:
<luceneMatchVersion>4.10.4</luceneMatchVersion>
I recommend you to check your current solr version, in my case was 4.10.4.
if you are going to reindex, then both numbers should match. The only reason you might want to have them different, is if you had and index created with say Lucene 4.7, then you would have
<luceneMatchVersion>4.7</luceneMatchVersion>
Then, you upgrade lucene to 4.10.
Now, if among the changes in between 4.7 and 4.10 there are things that work differently regarding analysis (you get the same sentence analysed in both versions and get different output as a result), then, you might want to keep the version number at 4.7, otherwise some queries that contain affected terms might not work (as they were analysed at index time in a different way than at query time). You have to asses how critical that issue might be.
That is why the recommendation is to upgrade, change the setting to the current number, and reindex. This way you are sure to avoid any issue.
If anyone is using Drupal, the Search API Solr (search_api_solr) module has config templates by version in /sites/all/modules/search_api_solr/solr-conf/.
The template README.md states the following:
The solr-conf-templates directory contains config-set templates for
different Solr versions.
These are templates and are not to be used as config-sets!
To get a functional config-set you need to generate it via the Drupal
admin UI or with drush solr-gsc. See README.md in the module
directory for details.
The module's README.md lists these instructions:
Make sure you have Apache Solr started and accessible (i.e. via port 8983). You can start it without having a core configured at
this stage.
Visit Drupal configuration (/admin/config/search/search-api) and create a new Search API Server according to the search_api
documentation using "Solr" as Backend and the connector that
matches your setup. Input the correct core name (which you will
create at step 4, below).
Download the config.zip from the server's details page or by using drush solr-gsc with proper options, for example for a server named
"my_solr_server": drush solr-gsc my_solr_server config.zip 8.4.
Copy the config.zip to the Solr server and extract.
I generated a config file for 8.x, and it uses this:
<luceneMatchVersion>${solr.luceneMatchVersion:LUCENE_80}</luceneMatchVersion>

Is it possible to modify a schema using the rest API in Apache Solr?

I think the title is self-explanatory.
I don't see anything on the Apache Solr wiki that suggests you can maintain the schema of an Apache Solr instance using the ReST API, but maybe (hopefully) you know something I don't.
I just found a section on the Solr wiki where they describe this exact feature for release 4.4 (which is not released yet).
It does have some prerequisite configuration on the Solr instance, but it does allow you to add fields to the schema. Based on that information, I can't see why they won't eventually extend the functionality to allow you to delete as well. I guess we will have to wait and see.
Here is the link to that section: http://wiki.apache.org/solr/SchemaRESTAPI#Adding_fields_to_a_schema. It also references this JIRA issue: "In preparation for dynamic schema modification via REST API, add a "managed" schema facility".

Making one of Liferay communities (called sites) not indexed in solr

We are using Liferay (6.1.20 EE) with Solr search engine.
Now Solr indexes everything. Can we somehow set up Solr (or Liferay) to prevent one Site from being indexed?
It means all articles documents present on that Site would not be indexed and would not be present in Solr.
1) Should this be done with Solr configurations/schema filters before Index starts?
OR
2) Should it be customized in Liferay Indexer classes (with help of Hooks or EXT) to skip content being indexed.
Thanks for your thoughts and suggestions.
Regards,
Kris
You could create a custom version of the solr-web WAR file that you need to install to make the Liferay/SOLR integration work. In the WAR file you'll find SolrIndexWriterImpl. This is the place that everything passes through that will be indexed in SOLR. You could create your own custom implementation of this class that uses the information in the SearchContext parameter, that's passed into each method, to decide if something should be indexed or not.
The latest code for solr-web can be found here: http://svn.liferay.com/repos/public/plugins/trunk/webs/solr-web/
Based on this code I was also able to create a solr-web.war that works on the more recent SOLR versions instead of the ancient 1.4.1 version Liferay uses by default.

Resources