Apache Solr setup for two diffrent projects - solr

I just started using apache solr with it's data import functionality on my project
by following steps in http://tudip.blogspot.in/2013/02/install-apache-solr-on-ubuntu.html
but now I have to make two different instances of my project on same server with different databases but with same configuration of solr for both projects. How can I do that?
Please help me if anyone can?

Probably the closest you can get is having two different Solr cores. They will run under the same server but will have different configuration (which you can copy paste).

When you say "different databases", do you mean you want to copy from several databases into one join collection/core? If so, you just define multiple entities and possibly multiple datasources in your DataImportHandler config and run either all of them or individually. See the other question for some tips.
If, on the other hand, you mean different Solr cores/collections, then you just want to run Solr with multiple cores. It is very easy, you just need solr.xml file above your collection level as described on the Solr wiki. Once you get the basics, you may want to look at sharing instance directories and having separate data directories to avoid changing the same config twice (instanceDir vs. dataDir settings on each core).

Related

Automate creation of some Solr cores on a Linux machine

I need to create a bunch of solr cores on a Linux box. I can do this relatively easily with a combination of command line interactions to create the necessary directory structure, and the solr admin console to actually create the cores.
I would like to automate this process, but I'm not sure how to proceed. I can create the cores using the REST API, but the directory structure needs to already exist as far as I can tell. Also, I am a Windows user. Is there any way this can be done entirely from a Windows machine?
I'm not looking for code samples, I'm looking for advice on the technology/techniques I would use to accomplish this?
The url for creating core is "http://localhost:8983/Solr/admin/cores?action=CREATE&name=core-name&instanceDir=path/to/dir&config=solrconfig.xml&dataDir=data"
Here you can write a scheduler for it creating the core. Before creating the core you can check if the instanceDir exist. If not you can create the same and map it to the core creating url.
Next is solr core requires the configset, you can create your own configset and add the required files to it. Again map the the config set path to the solr core url.
Data dir is the path where indexes are stored. Create the folder and map the path of it to the solr core creation url.
You can do the same by adding all these values in the database like storing the values of configset, instanceDir etc in the tables. Use those values for creating the core. You can change these values in the database as required. You need not have to change the values at the code side. Without the code modification it will continue working.
if you are running it on unix, then you can run the cron job for creating the core as well.

Solr - How do you get cores on different servers to have the same name when creating via HTTP

I have run the following via HTTP:
http://solr-uat.cambridgeassessment.org.uk/solr/admin/collections?action=create&name=ocr_education_and_learning_web8&numShards=1&maxShardsPerNode=8&replicationFactor=3&collection.configName=ocr_education_and_learning
and it created the collection but the cores on each server (there are 3 servers) have had the name appended (e.g. ocr_education_and_learning_web8_shard1_replica1). I am integrating with SI4T and it seems to use the core name rather than the collection name so the core names need to be the same across servers but I can't find how to do this.
Can anyone advise how best to do this?
As far as I know you can't do this. Core names must be unique. This naming scheme is internal to SolrCloud and is used to distinguish different indexes ('cores') from each other (which each make up part of the overall collection).
See this nice answer for more information

Solr luceneMatchVersion syntax

I have Solr 4.10 and I have collection on it with solorconfig.xml has the value for <luceneMatchVersion> as follows:
<luceneMatchVersion>4.7</luceneMatchVersion>
Is this correct? I saw other examples that has values such as LUCENE_35 What I need to know also, how could I express LUCENE_xx from my current Solr version?
You should use:
<luceneMatchVersion>4.10.4</luceneMatchVersion>
I recommend you to check your current solr version, in my case was 4.10.4.
if you are going to reindex, then both numbers should match. The only reason you might want to have them different, is if you had and index created with say Lucene 4.7, then you would have
<luceneMatchVersion>4.7</luceneMatchVersion>
Then, you upgrade lucene to 4.10.
Now, if among the changes in between 4.7 and 4.10 there are things that work differently regarding analysis (you get the same sentence analysed in both versions and get different output as a result), then, you might want to keep the version number at 4.7, otherwise some queries that contain affected terms might not work (as they were analysed at index time in a different way than at query time). You have to asses how critical that issue might be.
That is why the recommendation is to upgrade, change the setting to the current number, and reindex. This way you are sure to avoid any issue.
If anyone is using Drupal, the Search API Solr (search_api_solr) module has config templates by version in /sites/all/modules/search_api_solr/solr-conf/.
The template README.md states the following:
The solr-conf-templates directory contains config-set templates for
different Solr versions.
These are templates and are not to be used as config-sets!
To get a functional config-set you need to generate it via the Drupal
admin UI or with drush solr-gsc. See README.md in the module
directory for details.
The module's README.md lists these instructions:
Make sure you have Apache Solr started and accessible (i.e. via port 8983). You can start it without having a core configured at
this stage.
Visit Drupal configuration (/admin/config/search/search-api) and create a new Search API Server according to the search_api
documentation using "Solr" as Backend and the connector that
matches your setup. Input the correct core name (which you will
create at step 4, below).
Download the config.zip from the server's details page or by using drush solr-gsc with proper options, for example for a server named
"my_solr_server": drush solr-gsc my_solr_server config.zip 8.4.
Copy the config.zip to the Solr server and extract.
I generated a config file for 8.x, and it uses this:
<luceneMatchVersion>${solr.luceneMatchVersion:LUCENE_80}</luceneMatchVersion>

Solr cloud distributed search on collections

Currently I have a zookeeper instance controlling replication on 3 physical servers. It is the solr integrated zookeeper. 1 shard, 1 collection.
I have a new requirement in which I will need a new static solr instance (1 new collection, no replication). Same schema as previous collection. A copy of this instance will also be placed on the 3 physical servers mentioned above. A caveat is that I need to perform distributed searches across the 2 collections and have the results blended.
Thanks to javacreed I now know that sharding is not in my solution. Previous questions answers here and here.
In my current setup I run the following command on the server running zookeeper -
java -Dbootstrap_confdir=solr/myApp/conf -Dcollection.configName=myConfig -DzkRun -DnumShards=1 -jar start.jar
Am I correct in saying that this will not change and I will now also manually start the non replicated collection. I really only need to change my search queries to include the 'collection' parameter? Something like -
http://localhost:8983/solr/collection1/select?collection=collection1,collection2
This example is from Solr documentation. I am slightly confused as to whether it should be ...solr/collection1/select?... or ...solr/collection2/select?... or if it even matters?
Thanks
Thanks for your kind word stewart.You can search it directly on solr as
http://localhost:8983/solr/select?collection=collection1,collection2
There is no need to mention any collection path since you are defining them in the collection parameters.

Multiple Solr One Application Server (Jboss 7.1)

Is it possibile to have multiple Solrs in the same application server?
If yes, how can I do it?
Im in need of 3 Solr instance and I want them running at the same application server.
Im using Solr 3.6 and Jboss 7.1
Thanks in advance!
It basically depends on what exactly your requirement is.
If your requirement is just to have 3 separate indexes to search upon 3 different modules within a single application, you could probably go with multiple cores in same Solr server.
Refer http://wiki.apache.org/solr/CoreAdmin for more details regarding Solr cores.
If you are planning to host a separate search server for 3 independent applications, then I would suggest you go with 3 Solrs on different ports, as given in above answer.
Yes. You can deploy them on different ports.
http://localhost:8080/solr1
http://localhost:8081/solr2
http://localhost:8082/solr3
and so on.
Check out the instructions from this link http://wiki.apache.org/solr/SolrJBoss

Resources