Solr - Migrate Documents from one Collection to another existing one - solr

I need to move all Solr Documents from one collection to another (already existing collection) - there are 500,000 documents.
I have tried the solr migrate but cannot get the routing key correct. I have tried:
curl 'http://localhost:8983/solr/admin/collections?action=MIGRATE&collection=oldCollection&target.collection=newCollection&split.key=!'
I have solr 4.10.3 installed in a cloudera installation.

Copy your existing oldCollection, and rename the as newCollection,
After that you may need to update some config files for the same.
Or create a new one using the below api
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1

The answer and the question are quite old, starting from 8.1 solr version, there is a feature specific for this purpose which is the reindexcollection api which can directly be used to reindex docs from source to a target collection with a lot of configurable options. Here is the link to the official doc : https://lucene.apache.org/solr/guide/8_1/collections-api.html#reindexcollection

Related

upgrade solr from 4.2.1 to 5.3.1

I've been tasked with migrating from our solr 4.2.1 server to a new solr server, 5.3.1. I was hoping I could just pick up the cores, and move them over with a little but of editing files. But atlas, I can't quite figure it out.
I have tried moving a single core, and creating a core.properties files with the name of the core and I get:
testcore: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error loading class 'solr.JsonUpdateRequestHandler'
Any thoughts as to what the problem might be? Any thoughts would be appreciated, thank you!
I am in the final stages of the similar upgrade; here is how I suggest you proceed.
Install both versions side by side and create the collection in new solr
Take your default schema/solrconfig from the new solr and move stuff into it from your old schema/solrconfig. The formatting changed, so you will need to manually move all of your config.
Make sure that works
Move the indexes - once your solrconfig and schema match up you should be able to use your old indexes (data directory).
To complete the upgrade you will need to re-index into a new but similar collection. This will upgrade the underlying lucene indexes. Your new version of solr has cursor mark support so it becomes much simplier; especially if you are using collection aliases.
JSON does not have its own request handler any longer (changed in 4.x, removed in 5.x). It has now been merged into the standard solr.UpdateRequestHandler, and the request handler is selected internally based on the Content-Type header of the request.

Solr luceneMatchVersion syntax

I have Solr 4.10 and I have collection on it with solorconfig.xml has the value for <luceneMatchVersion> as follows:
<luceneMatchVersion>4.7</luceneMatchVersion>
Is this correct? I saw other examples that has values such as LUCENE_35 What I need to know also, how could I express LUCENE_xx from my current Solr version?
You should use:
<luceneMatchVersion>4.10.4</luceneMatchVersion>
I recommend you to check your current solr version, in my case was 4.10.4.
if you are going to reindex, then both numbers should match. The only reason you might want to have them different, is if you had and index created with say Lucene 4.7, then you would have
<luceneMatchVersion>4.7</luceneMatchVersion>
Then, you upgrade lucene to 4.10.
Now, if among the changes in between 4.7 and 4.10 there are things that work differently regarding analysis (you get the same sentence analysed in both versions and get different output as a result), then, you might want to keep the version number at 4.7, otherwise some queries that contain affected terms might not work (as they were analysed at index time in a different way than at query time). You have to asses how critical that issue might be.
That is why the recommendation is to upgrade, change the setting to the current number, and reindex. This way you are sure to avoid any issue.
If anyone is using Drupal, the Search API Solr (search_api_solr) module has config templates by version in /sites/all/modules/search_api_solr/solr-conf/.
The template README.md states the following:
The solr-conf-templates directory contains config-set templates for
different Solr versions.
These are templates and are not to be used as config-sets!
To get a functional config-set you need to generate it via the Drupal
admin UI or with drush solr-gsc. See README.md in the module
directory for details.
The module's README.md lists these instructions:
Make sure you have Apache Solr started and accessible (i.e. via port 8983). You can start it without having a core configured at
this stage.
Visit Drupal configuration (/admin/config/search/search-api) and create a new Search API Server according to the search_api
documentation using "Solr" as Backend and the connector that
matches your setup. Input the correct core name (which you will
create at step 4, below).
Download the config.zip from the server's details page or by using drush solr-gsc with proper options, for example for a server named
"my_solr_server": drush solr-gsc my_solr_server config.zip 8.4.
Copy the config.zip to the Solr server and extract.
I generated a config file for 8.x, and it uses this:
<luceneMatchVersion>${solr.luceneMatchVersion:LUCENE_80}</luceneMatchVersion>

Custom stat aggregation and sorting function in solr

Please provide suggestion/example to create a custom stat[Aggregate] function in the Solr.
Every document in Solr is having 30 String, 10 int and 1 binary[hashset to store unique users] fields.
I want to add a new stat function (UniqueUsers(fieldName) like add/avg function already available in Solr) to find the unique across searched Solr records.
While googling, i found following Issue ticket SOLR-5302 ,which is already resolved ,but i dint find any example to implement it.
https://issues.apache.org/jira/browse/SOLR-5302
Solr already has a stats component that does aggregation and combines it with facets.
http://wiki.apache.org/solr/StatsComponent
https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
If this doesn't do exactly what you need, you can copy that source code file (and any that it references which need changes) to your own custom component and compile a new project against the Solr jars that you can find in the dist/ directory of the download. I don't have precise instructions for doing this -- the person who sets that up must have extensive knowledge of how to build Java development projects. Then you must inform Solr about the new Component and use it in your request handler, by modifying solrconfig.xml.
The trunk version of the Solr source code already includes a new and more capable AnalyticsComponent - SOLR-5302. Because it is missing a little bit of functionality already present in the StatsComponent, I don't think it is available in a released version of Solr yet. You're welcome to take the patches and do the manual work that may be required so they work with a 4.x version.

Setting up Solr and Querying it

I am new to Solr.
I am not able to find out a proper document which could help me understand what all do I need to add in the solrconfig.xml and what is to be removed.
My SolrDocument would contain id, field1, field2. Out of the 2 fields, I want to update 1 of them. How do I do? I tried a few things but it overwrites the entire document.
/update is not working.
I have to add documents and retrieve them from inside a Java class.
You can refer to Solr Wiki for Solr Config.xml it is a good starting point to understand the configuration options.
Solr does not really have an update concept, it always deletes the existing document and replaces it with new document. There is a feature request open years back JIRA-139 to address this problem, but as of today it shows the fix version to be 4.1. But Solr 4.0 has a new feature Atomic update that you could try, if this is something very critical for you. Note: Solr 4.0 is still a Beta.
'/update' not working -> do you mean not working since it is replacing the old document with new document or do you get error/exception ?
To add & retrieve documents from Java, you can use SolrJ. SolrJ is Java client to access Solr programmatically. SolrJ - Solr Wiki.

Making one of Liferay communities (called sites) not indexed in solr

We are using Liferay (6.1.20 EE) with Solr search engine.
Now Solr indexes everything. Can we somehow set up Solr (or Liferay) to prevent one Site from being indexed?
It means all articles documents present on that Site would not be indexed and would not be present in Solr.
1) Should this be done with Solr configurations/schema filters before Index starts?
OR
2) Should it be customized in Liferay Indexer classes (with help of Hooks or EXT) to skip content being indexed.
Thanks for your thoughts and suggestions.
Regards,
Kris
You could create a custom version of the solr-web WAR file that you need to install to make the Liferay/SOLR integration work. In the WAR file you'll find SolrIndexWriterImpl. This is the place that everything passes through that will be indexed in SOLR. You could create your own custom implementation of this class that uses the information in the SearchContext parameter, that's passed into each method, to decide if something should be indexed or not.
The latest code for solr-web can be found here: http://svn.liferay.com/repos/public/plugins/trunk/webs/solr-web/
Based on this code I was also able to create a solr-web.war that works on the more recent SOLR versions instead of the ancient 1.4.1 version Liferay uses by default.

Resources