Using the Solr 4 spatial field types seems to require an external library, the Java Topology Suite. How does one install this suite for use with Solr 4.1.0 on Ubuntu Server 12.04 with Java 1.6.0_24?
Thank you.
If you are running Solr in Tomcat on your Ubuntu Server and have deployed the Solr WAR into your <path to Tomcat>/webapps folder. Then according to the Lucene / Solr 4 Spatial documentation on the Solr Wiki, you just need to copy the all the jar files from the JTS distribution /lib folder to the WEB-INF/lib folder where Solr is running.
Update
Since you are using Jetty to run Solr, you will need to include the location of the JTS jar files as a classpath. Based on the Classloading Jetty documentation, something like the following should work:
java -Dsolr.solr.home=/mnt/SolrFiles/solr
-Djetty.class.path=<insert path to JTS here> -jar /opt/solr-4.1.0/example/start.jar
The JTS JAR file needs to be placed in the Solr web application's WEB-INF/lib folder. Otherwise you may encounter a NoClassDefFoundError: com/vividsolutions/jts/geom/Geometry when starting Solr.
Related
In Apache Solr 8.4.1, where could I find the solr port that will listen to Solr REST API request.
Where to find Apache Solr port configuration/setting in Apache Solr filesystem or source files or in which xml file can I find it.
At the root of your Solr installation, you will find a bin folder that contains the scripts used to interact with Solr instances (this is what we have here).
The port Solr binds to and other settings are defined in solr.in.sh (or solr.in.cmd if you are on a windows machine). As stated in that file :
Settings here will override settings in existing env vars or in bin/solr. The default shipped state of this file is completely commented.
By default, you should have this :
#SOLR_PORT=8983
I have a standalone Solr instance with 4 different cores working fine using the embedded Jetty server. I configured the cores for v4.10.3 but since I moved to v5.1 and all seems to work fine without any changes.
Before going into production, I need to set it up as a Solrcloud installation, initially with 2 nodes (two different machines) with 1 shard per node (to keep it simple). I have been trying to get it to work but I have not been able to do it.
I tried to run it like this (I think using start.jar is not the preferred way), having read that Solr will look for multiple configured cores in any nested folders (which works for standalone Solr):
java -DzkRun -DnumShards=2 -Dbootstrap_confdir=solr/ -jar start.jar
but that did not work, it does not find the needed solrconfig.xml file.
My Solr directory looks like this:
My solr.xml file is the standard one:
<solr>
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">${jetty.port:8983}</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:30000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
</solr>
Each core looks like this:
And the core.properties just has the name of the core:
name=users
My question is:
How do I start Solrcloud v5.1 so the 4 cores are picked up?
In SolrCloud each of your Core will become a Collection.
Each Collection will have its own set of Config Files and data.
You might find this helpful Moving multi-core SOLR instance to cloud
Solr 5.0 (onwards) has made some changes on how to create a SolrCloud setup with shards, and how to add collections etc.
Everything listed below is my understanding of the Solr Reference Guide. I will highly recommend going through it thoroughly.
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide
I setup my servers on a Linux(CentOS) server, but the steps can be used to setup solr on Windows system also. For example, there is solr.cmd file instead of solr.sh
Here are the steps I followed to create a simple two shard SolrCloud setup.
Setup the zookeeper ensemble. I am assuming you are trying to use the
embedded ZK in solr. For a production system, it is highly
recommended to create a external ZK ensemble. You can find steps to install a external ensemble in this section of reference guid
Download solr to /opt folder.
Extract the install file ONLY.
tar xzf solr-5.0.0.tgz solr-5.0.0/bin/install_solr_service.sh --strip components=2
This command will install solr on your system
sudo bash ./install_solr_service.sh solr-5.0.0.tgz
The above command will create a new user called "solr" if it does not exist.
These are some of the default options it will assume. You can view this in /var/solr/solr.in.sh . This is the include file where you can specify other options.
* SOLR_PID_DIR=/var/solr
* SOLR_HOME=/var/solr/data
* LOG4J_PROPS=/var/solr/log4j.properties
* SOLR_LOGS_DIR=/var/solr/logs
* SOLR_PORT=8983
Running install_solr_service start in the above step will start a solr server. Stop the server using service solr stop before doing any of the changes below.
Change Java heap value
SOLR_HEAP="3g"
This will set Xmx and Xms as 3GB . (optional)
This variable is not mentioned in the solr.in.sh file in Solr 5.1 . Its a bug and has been fixed, will be released in next version.
SOLR_MODE="solrcloud" Required
this is what you need start solr in cloud mode.
ZK_HOST=ZK1:2181,ZK2:2181,ZK3:2181 Required
(replace zk with you zookeeper host names)
Running the install_solr_service.sh command also creates a init.d file as /etc/init.d/solr
This init.d script in turn calls the /opt/solr/bin/solr script and includes all the variables from /var/solr/solr.in.sh
Once you have made the above changes, start solr again using service solr start
You can check the status using service solr status
Creating Collections Shards and Replicas
- All shard, collection, replica related commands are now made using Collections API.
Before creating a collection a config folder should be uploaded to ZK .
This can be done using the zkcli.sh script in the solr folder (not on the zookeeper servers)
Folder: /opt/solr/server/scripts/cloud-scripts
The command to upload the confg folder is
sh zkcli.sh -cmd upconfig -zkhost zk1:2181,zk2:2181,zk3:2181 -confname yourconfigname -confdir /var/solr/configs/conf
You will run this command 4 times for each of your 4 cores, each time changing the path of the conf folder and config name.
This will upload all the config files in conf folder with the name 'yourconfigname' in zookeeper.
Creating a collection
I used the following command to create a new collection.
http://1.1.1.1:8983/solr/admin/collections?action=CREATE&name=yourcollectionname&numShards=2&replicationFactor=1&maxShardsPerNode=1&createNodeSet=1.1.1.1:8983_solr,2.2.2.2:8983_solr&collection.configName=yourconfigname
Happy Searching!
SolrCloud does not use configuration files stored in core conf directory. To make your cores visible in SolrCloud structure you need to upload the configuration files to ZooKeeper and keep it manage the files to you. All the time a Solr instance comes up it get the configuration files stored in ZooKeeper. This way your cores doesn't need to have conf directory to work. To upload your core configuration files to ZooKeeper follow the link bellow and take a look at Upload a configuration directory
https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities
I'm having a little trouble understanding how Solr fits in with Jetty, and why I can't seem to get the start.jar in the distribution package to work.
I can run all of the example configurations via java -jar start.jar. However, when I try to run something like the follwing --
java -Dsolr.solr.home=/Users/jwwest/solr -jar $(brew --prefix solr)/libexec/example/start.jar
-- the following error occurs:
java.io.FileNotFoundException: No XML configuration files specified in start.config or command line.
at org.eclipse.jetty.start.Main.start(Main.java:506)
at org.eclipse.jetty.start.Main.main(Main.java:95)
I opened up the start.jar file, and there is a start.config file located inside of the jar which I'm assuming should handle this configuration for me. I'm not understanding why it will work when run from inside of the distribution examples directory, but not outside of it.
You also need to define the jetty.home property. Try:
java -Dsolr.solr.home=/Users/jwwest/solr -jar $(brew --prefix solr)/libexec/example/start.jar -Djetty.home=$(brew --prefix solr)/libexec/example
You can see the effective command line start.jar generates by using the --dry-run command line flag.
java -jar start.jar --dry-run
That will output everything with full path names so you can run it from outside the directory.
Source: http://www.eclipse.org/jetty/documentation/9.0.0.M3/advanced-jetty-start.html
The start.jar is a jetty specific mechanism that works to build out all the classpath requirements for starting up Jetty. It is generally only used in the scope of the jetty distribution. Pulling the start.jar out of the configuration and placing it somewhere else renders the default configuration of the start.config rather moot.
My understanding of Solr is that it bundles itself with a distribution of jetty, placing what it needs to run into the distribution and repackages it as its own. They may have a custom start.config file that further adds its own locations for classpath resources and the like, or not.
The exception you are seeings stems from the start.config file expecting an etc/ directory containing jetty.xml formatted xml files which are used to configure the jetty process.
Jetty being often used in an embedded format has little to do with this issue, it is simply a common use case because jetty is incredibly easy to embed into an application. Embedded instances of jetty rarely (if ever) leverage a start.jar...instead it is up to the embedding application to manage its own classpath.
First, you need to change your folder where start.jar is located, then execute the same command.
Jetty is often used as embedded container. If you want to use the jetty, then a good start would be to copy the example directory and rename it to what you want it to be. The solr directory is the one for basic configuration.
Else it is recommended to use tomcat and the solr.war file.
I'm using Tika parser to index my files into Solr. I created my own parser (which extends XMLParser). It uses my own mimetype.
I created a jar file which inside looks like this:
src
|-main
|-some_packages
|-MyParser.java
|resources
|-META-INF
|-services
|-org.apache.tika.parser.Parser (which contains a line:some_packages.MyParser.java)
|_org
|-apache
|-tika
|-mime
|-custom-mimetypes.xml
In custom-mimetypes I put the definition of new mimetype becouse my xml files have some special tags.
Now where is the problem: I've been testing parsing and indexing with Solr on glassfish installed on my local machine. It worked just fine. Then I wanted to install it on some remote server. There is the same version of glassfish installed (3.1.1). I copied-pasted Solr application, it's home directory with all libraries (including tika jars and the jar with my custom parser). Unfortunately it doesn't work. After posting files to Solr I can see in content-type field that it detected my custom mime type. But there are no fields that suppose to be there like if MyParser class was never runned. The only fields I get are the ones from Dublin Core. I checked (by simply adding some printlines) that Tika is only using XMLParser.
Have anyone had similar problem? How to handle this?
Problem was that I was using Java 7 to compile my parser but Apache Tika was compiled with Java 5...
I want to setup search for my site. I couldn't find much information to install Jetty + Solr on my linode.
I could install solr-jetty on ubuntu simply using apt-get. any body has better experience with debian?
You shouldn't need to install it through the package manager. Jetty is deployed with Solr from the Solr homepage. So long as you have java installed on your server you can simply unpack the Solr distibution, read example/README.txt