Configure Solr for SQL Server [duplicate] - sql-server

Can you help me configuring Apache Solr using Tomcat and how to index in MS SQL database using Solr.
What are the steps to configure Tomcat to run Apache Solr in Tomcat.

Here is the step by step procedure that would help.
PART 1: SETTING UP SOLR with TOMCAT
Step 1: Download Solr. It's just a zip file.
Step 2: Copy from your SOLR_HOME_DIR/dist/apache-solr-1.3.0.war to your tomcat webapps directory: $CATALINA_HOME/webapps/solr.war – Note the war file name change. That’s important.
Step 3: Create your solr home directory at a location of your choosing. This is where the configuration for that solr install resides. The easiest way to do this is to copy the SOLR_HOME_DIR/examples/solr directory to wherever it is you want your solr home container to be. Say place it in C:\solr.
Step 4: Hope you have set your environment variables, if not then please set JAVA_HOME, JRE_HOME, CATALINA_OPTS, CATALINA_HOME. Note that CATALINA_HOME refers to your Tomcat directory & CATALINA_OPTS refers to the amount of heap memory you want to give to your Solr.
Step 5: Start tomcat. Note this is only necessary to allow tomcat to unpack your war file. If you look under $CATALINA_HOME/webapps there should now be a solr directory.
Step 6: Stop tomcat
Step 7: Go into that solr directory and edit WEB-INF/web.xml. Scroll down until you see an entry that looks like this:
<!-- People who want to hardcode their "Solr Home" directly into the
WAR File can set the JNDI property here...
-->
<!--
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/Path/To/My/solr/Home/solr/</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
-->
Set your Solr home (for example: C:\solr) and uncomment the env entry.
Step 8: Start Tomcat again, and things should be going splendidly. You should be able to verify that solr is running by trying the url http://localhost:8080/solr/admin/.
PART 2: SETTING UP SOLR WITH MSSQL SERVER USING DATA IMPORT HANDLER
Step 1: Download Microsoft SQL Server JDBC Driver 3.0. Just extract the contents. Create a folder under your solr home directory (Example: C:\solr\lib). Copy the file sqljdbc4.jar out of the archive downloaded above into it.
Step 2: So under your Solr home the basic directories needed are conf and lib. The first one i.e. conf you might have got with Step 3 of part 1 & lib is the directory you have created in Step 1 of part 2.
Step 3. Go to conf directory. Please open 3 files in your editor: data-config.xml, schema.xml & solrconfig.xml.
Step 4. Start by editing data-config.xml. Place your SQL query, DB Name, Server name etc. For an example:
• <dataConfig>
• <dataSource type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://X.Y.Z.U:1433;databaseName=myDB" user="test" password="tester" />
• <document>
• <entity name="Text" query="select DocumentId, Data from Text">
• <field column="DocumentId" name="DocumentId" />
• <field column="Data" name="Data" />
• </entity>
• </document>
• </dataConfig>
Step 5: Tell Solr about our data-config.xml file. This would be done by adding a request handler to the solrconfig.xml file which is solr configuration file.
Add the following requesthandler to solrconfig.xml:
• <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
• <lst name="defaults">
• <str name="config">C:\solr\conf\data-config.xml</str>
• </lst>
• </requestHandler>
Step 6: Configure schema.xml - In this file you can do several stuff like setting up datatypes of your fields, setting unique/primary key of your search etc.
Step 7: Start Tomcat
Step 8: Now visit http://localhost:8080/solr/admin/dataimport.jsp?handler=/dataimport & start your full import.
Some handy Notes:
• There are a number of reasons a data import could fail, most likely due to problem with
the configuration of data-config.xml. To see for sure what's going on you'll have to look in
C:\tomcat6\logs\catalina.*.
• If you happen to find that your import is failing due to system running out of memory,
however, there's an easy, SQL Server specific fix. Add responseBuffering=adaptive and
selectMethod=cursor to the url attribute of the dataSource node in data-config.xml. That stops the
JDBC driver from trying to load the entire result set into memory before reads can occur.
• Note that by default the index gets created in C:\Tomcat6\bin\solr\data\index. To change this path
just edit solrconfig.xml & change <dataDir>${solr.data.dir:./solr/data}</dataDir>.
• In new Solr versions, I think 3.0 and above you have to place the 2 data import handler
jars in your solr lib directory (i.e. for example apache-solr-dataimporthandler-3.3.0.jar & apache-
solr-dataimporthandler-extras-3.3.0.jar). Search for them in your Solr zip you downloaded. In older
Solr versions this is not required because they are bundled with solr.war. Since we have placed the
data import handlers in the lib directory so we need to specify their paths in solrconfig.xml. Add
this line to solrconfig.xml: (Example: <lib dir="C:/solr/lib/" regex="apache-solr-dataimporthandler-
\d.*\.jar" />)

Related

Update jar file in Solr 4.4.0

I have Solr cloud configuration which we run on 4 servers. We use tomcat as web server for solr. I have 5 zookeepers to maintain the data-replication. I have added a jar file with custom update processor. This is in shared folder which is mention in solr.xml
<solr persistent="true" sharedLib="/solr/lib">
While creating the first version of this jar file I gave the name updateProcessor.0.1.jar as the file name. Even though it was shared, jar files were added in all the 4 servers.
But now I have to update the updateProcessor. For this I created updateProcessor0.2.jar. I deleted the updateProcessor.0.1.jar from each sever and added a new one. But changes were not seen ?
Any ideas what I am doing wrong? Should this is be checked using zkcli ?
Well I found a roundabout which may help someone in future maybe.
I changed entry in solrconfig from
<processor class="org.apache.solr.update.processor.MyUpdateProcessorFactory">
to
<processor class="org.apache.solr.update.processor.MyUpdateProcessorFactory2">
I renamed the class file I created in solr config from MyUpdateProcessorFactory to MyUpdateProcessorFactory2

Schema.xml is not created in the core directory

I was following this article to start with my apache solr expedition.
http://examples.javacodegeeks.com/enterprise-java/apache-solr/apache-solr-tutorial-beginners/
I have created a solr core using below command-
> solr create -c mycorename
This has created a core but schema.xml file is not created inside the conf directory. Instead of this i am able to see managed-schema.xml file. Does this command create a schemaless core. Please let me know how i can create a core that also have a schema.xml file created in it.
Yes, i've got some issue.
Shortly. Solr can use ether static schema, or dynamic (REST api) schema. So, you should select which one you'll use.
You can do id in solrconfig.xml
Like this:
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
More info, read this guide

Which schema.xml file edit in Solr?

I downloaded a Solr package from here: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
I want to create a new field in schema.xml file, but I don't know in which one - in downloaded folder there are 7 schema.xml files.
I edited all of this files, but nothing changed.
Where should I add a new field definition?
If you have standard distribution (http://lucene.apache.org/solr/mirrors-solr-latest-redir.html) you can find your cores at solr-5.2.1/server/solr
One problem you might face is that you don't have any cores defined yet. If that is a case copy solr-5.2.1\server\solr\configsets\basic_configs to solr-5.2.1\server\solr\my_new_core (rename folder) - congratz, you defined a new core.
Now run the server: run in command line solr-5.2.1\bin\solr with parameter start. Open in your browser: http://localhost:8983/solr/#/~cores/example_core and Add core with instanceDir = my_new_core. This initializes your core - makes it fully functional.
Now you can find solr-5.2.1\server\solr\example_core\conf\solrconfig.xml
and configure it as you will. After changing solrconfig.xml remember to reload core at Core Admin.
Create a solr instance by running a command from the command prompt
solr create -c test
Here test is the collection name.Copy schema.xml and solrconfig.xml from techproducts project folder to conf folder of the test project.Now you can define your schema in the schema.xml.

Solrcloud multicore configuration

I have a standalone Solr instance with 4 different cores working fine using the embedded Jetty server. I configured the cores for v4.10.3 but since I moved to v5.1 and all seems to work fine without any changes.
Before going into production, I need to set it up as a Solrcloud installation, initially with 2 nodes (two different machines) with 1 shard per node (to keep it simple). I have been trying to get it to work but I have not been able to do it.
I tried to run it like this (I think using start.jar is not the preferred way), having read that Solr will look for multiple configured cores in any nested folders (which works for standalone Solr):
java -DzkRun -DnumShards=2 -Dbootstrap_confdir=solr/ -jar start.jar
but that did not work, it does not find the needed solrconfig.xml file.
My Solr directory looks like this:
My solr.xml file is the standard one:
<solr>
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">${jetty.port:8983}</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:30000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
</solr>
Each core looks like this:
And the core.properties just has the name of the core:
name=users
My question is:
How do I start Solrcloud v5.1 so the 4 cores are picked up?
In SolrCloud each of your Core will become a Collection.
Each Collection will have its own set of Config Files and data.
You might find this helpful Moving multi-core SOLR instance to cloud
Solr 5.0 (onwards) has made some changes on how to create a SolrCloud setup with shards, and how to add collections etc.
Everything listed below is my understanding of the Solr Reference Guide. I will highly recommend going through it thoroughly.
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide
I setup my servers on a Linux(CentOS) server, but the steps can be used to setup solr on Windows system also. For example, there is solr.cmd file instead of solr.sh
Here are the steps I followed to create a simple two shard SolrCloud setup.
Setup the zookeeper ensemble. I am assuming you are trying to use the
embedded ZK in solr. For a production system, it is highly
recommended to create a external ZK ensemble. You can find steps to install a external ensemble in this section of reference guid
Download solr to /opt folder.
Extract the install file ONLY.
tar xzf solr-5.0.0.tgz solr-5.0.0/bin/install_solr_service.sh --strip components=2
This command will install solr on your system
sudo bash ./install_solr_service.sh solr-5.0.0.tgz
The above command will create a new user called "solr" if it does not exist.
These are some of the default options it will assume. You can view this in /var/solr/solr.in.sh . This is the include file where you can specify other options.
* SOLR_PID_DIR=/var/solr
* SOLR_HOME=/var/solr/data
* LOG4J_PROPS=/var/solr/log4j.properties
* SOLR_LOGS_DIR=/var/solr/logs
* SOLR_PORT=8983
Running install_solr_service start in the above step will start a solr server. Stop the server using service solr stop before doing any of the changes below.
Change Java heap value
SOLR_HEAP="3g"
This will set Xmx and Xms as 3GB . (optional)
This variable is not mentioned in the solr.in.sh file in Solr 5.1 . Its a bug and has been fixed, will be released in next version.
SOLR_MODE="solrcloud" Required
this is what you need start solr in cloud mode.
ZK_HOST=ZK1:2181,ZK2:2181,ZK3:2181 Required
(replace zk with you zookeeper host names)
Running the install_solr_service.sh command also creates a init.d file as /etc/init.d/solr
This init.d script in turn calls the /opt/solr/bin/solr script and includes all the variables from /var/solr/solr.in.sh
Once you have made the above changes, start solr again using service solr start
You can check the status using service solr status
Creating Collections Shards and Replicas
- All shard, collection, replica related commands are now made using Collections API.
Before creating a collection a config folder should be uploaded to ZK .
This can be done using the zkcli.sh script in the solr folder (not on the zookeeper servers)
Folder: /opt/solr/server/scripts/cloud-scripts
The command to upload the confg folder is
sh zkcli.sh -cmd upconfig -zkhost zk1:2181,zk2:2181,zk3:2181 -confname yourconfigname -confdir /var/solr/configs/conf
You will run this command 4 times for each of your 4 cores, each time changing the path of the conf folder and config name.
This will upload all the config files in conf folder with the name 'yourconfigname' in zookeeper.
Creating a collection
I used the following command to create a new collection.
http://1.1.1.1:8983/solr/admin/collections?action=CREATE&name=yourcollectionname&numShards=2&replicationFactor=1&maxShardsPerNode=1&createNodeSet=1.1.1.1:8983_solr,2.2.2.2:8983_solr&collection.configName=yourconfigname
Happy Searching!
SolrCloud does not use configuration files stored in core conf directory. To make your cores visible in SolrCloud structure you need to upload the configuration files to ZooKeeper and keep it manage the files to you. All the time a Solr instance comes up it get the configuration files stored in ZooKeeper. This way your cores doesn't need to have conf directory to work. To upload your core configuration files to ZooKeeper follow the link bellow and take a look at Upload a configuration directory
https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities

Solr (4.4+) solrconfig.xml location when creating cores

I'm Trying to setup a multi core solr server for our webapplication but i'm having trouble creating new core through the coreadmin service.
I'm using Solr-4.4 because 4.3 ran into problems persisting the cores in solr.xml (datadir wasn't preserved) So i'm using the new Solr.xml configuration 4.4 and beyond
My solr.xml currently looks like:
<solr>
<str name="coreRootDirectory">default-instance/cores/</str>
</solr>
solrconfig.xml is located at (solrhome)/default-instance/conf/solrconfig.xml
When trying to create a core with the url
http:/example.org/solr/admin/cores?action=CREATE&name=test-name&schema=schema-test.xml&loadOnStartup=false
gives me the error:
Error CREATEing SolrCore 'test-name': Unable to create core: test-name
Caused by: Can't find resource 'solrconfig.xml' in classpath or
'default-instance/cores/test-name/conf/', cwd=/var/lib/tomcat7
The following seems to work:
http:/example.org/solr/admin/cores?action=CREATE&name=test-name&schema=schema-test.xml&loadOnStartup=false&config=/absolute/file/path/to/solrconfig.xml
The problem is this only seems to work with a absolute path (or possibly a relative path from /var/lib/tomcat7) which is not a workable solution.
What i'm looking for is a way to place solrconfig.xml so it can be used to create new cores with that config (or a way the create those cores with the current location).
More or less the same will be needed for schemas
This worked. Ran on command line and was viewable in admin console:
solr create -c (name for core or collection)
See README.txt for more info.
In my case I took advantage of the Core Discovery feature in 4.4+, rather than creating the core using the management web interface.
This simply involved copying the example collection1 folder from the examples directory (which I usually use as a starting point).
Then I had to make sure that there is core.properties in the root of my new core with name=<new core name> inside. Solr automatically detected the new core and allowed me to use it without any fuss.
This avoided the trouble of having to copying solrconfig.xml and schema.xml into any special location.
I had the same problem: solrconfig.xml was not in the classpath. I solved it by copying my configuration file templates into the classpath.
So I took a look at http://localhost:8983/solr/#/~java-properties to see solrs classpath definition and then i copied the template solrconfig.xml and schema.xml into the folder C:\servers\solr-4.4.0\example\resources. Furthermore i copied all the stopwords stuff there...
This solution is not a fully satisfying, but it works. Adding another path to the classpath should work, too. I'm slightly astonished that no default configuration for new cores can be declared within solr.xml
I recommend the new Config Sets for this use case.
If you place your schema.xml and solrconfig.xml (and other config files like stopwords etc.) in a directory $SOLR_HOME/configsets/myConfig/conf, you can create a new core with this config by calling:
http://solr/admin/cores?action=CREATE&name=mycore&instanceDir=my_instance&configSet=myConfig
See https://cwiki.apache.org/confluence/display/solr/Config+Sets
But they are not available until Solr 4.8, see https://issues.apache.org/jira/browse/SOLR-4478

Resources