Index a shared folder with Solr - solr

How can I index a shared folder (not local) with Solr ? Is it possible or should I copy my shared folder into a local folder ?

You can definitely run the indexer on the different server from Solr. You just need to run the post tool with the right parameters.
So, two things:
You can run the post tool as a jar, not as a classpath and full name invocation
You can see all supported parameters by running: java -jar example\exampledocs\post.jar -h , the one you want is -Durl

Related

How to specify file types when indexing Solr

I've been indexing a directory of folders/files containing html pages, docs, ppts, pdfs..etc. I noticed a type of file called LOG that is being indexed and I don't want it to be indexed because the contents aren't needed.
To index to Solr i've been using this command (i am a windows user so i use the simple post tool): java -Dc=collection -Dport=4983 -Drecursive -Dauto jar example/exampledocs/post.jar c:/folder Instead, I tried to do the following command to exclude LOG files:
java -Dc=collection -Dport=4983 -Drecursive -Dfiletypes=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt jar example/exampledocs/post.jar c:/folder
Solr refuses to index, and throws errors (#400 http). -Dfiletypes should be an actual command i can use, but Solr doesn't seem to like it. I even tried [] around the list of file types and it won't work. Is my syntax wrong?
If I add -Dauto, it works!
java -Dc=collection -Dport=4983 -Drecursive -Dauto -Dfiletypes=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt jar example/exampledocs/post.jar c:/folder

Which schema.xml file edit in Solr?

I downloaded a Solr package from here: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
I want to create a new field in schema.xml file, but I don't know in which one - in downloaded folder there are 7 schema.xml files.
I edited all of this files, but nothing changed.
Where should I add a new field definition?
If you have standard distribution (http://lucene.apache.org/solr/mirrors-solr-latest-redir.html) you can find your cores at solr-5.2.1/server/solr
One problem you might face is that you don't have any cores defined yet. If that is a case copy solr-5.2.1\server\solr\configsets\basic_configs to solr-5.2.1\server\solr\my_new_core (rename folder) - congratz, you defined a new core.
Now run the server: run in command line solr-5.2.1\bin\solr with parameter start. Open in your browser: http://localhost:8983/solr/#/~cores/example_core and Add core with instanceDir = my_new_core. This initializes your core - makes it fully functional.
Now you can find solr-5.2.1\server\solr\example_core\conf\solrconfig.xml
and configure it as you will. After changing solrconfig.xml remember to reload core at Core Admin.
Create a solr instance by running a command from the command prompt
solr create -c test
Here test is the collection name.Copy schema.xml and solrconfig.xml from techproducts project folder to conf folder of the test project.Now you can define your schema in the schema.xml.

solr not writing logs when it runs not from its main folder

When I run solr using
java -jar "C:\solr\example\start.jar"
It writes logs to C:\solr\example\logs.
When I run it using
java -Dsolr.solr.home="C:\solr\example\solr"
-Djetty.home="C:\solr\example"
-Djetty.logs="C:\solr\example\logs"
-jar "C:\solr\example\
start.jar"
it writes logs only if I run it from
C:\solr\example>
any other folder - logs are not written.
This is important as I need to run it as a service later (using nssm)
What should I change?
As you have discovered, the Jetty-hosted example distributed with Solr must be started in the example directory to function properly. Try creating a batch file that changes to the directory then invokes Java, like this:
C:
cd C:\solr\example\
java -Dsolr.solr.home="C:\solr\example\solr"
-Djetty.home="C:\solr\example"
-Djetty.logs="C:\solr\example\logs"
-jar "C:\solr\example\
Then have NSSM run the batch file instead of java.
Both answers should work for you.
You could set it up using apache Tomcat as opposed to the Jetty instance Solr comes with. Tomcat which comes standard with a startup.bat batch file that you use to start your server

Running Solr with Jetty

I'm having a little trouble understanding how Solr fits in with Jetty, and why I can't seem to get the start.jar in the distribution package to work.
I can run all of the example configurations via java -jar start.jar. However, when I try to run something like the follwing --
java -Dsolr.solr.home=/Users/jwwest/solr -jar $(brew --prefix solr)/libexec/example/start.jar
-- the following error occurs:
java.io.FileNotFoundException: No XML configuration files specified in start.config or command line.
at org.eclipse.jetty.start.Main.start(Main.java:506)
at org.eclipse.jetty.start.Main.main(Main.java:95)
I opened up the start.jar file, and there is a start.config file located inside of the jar which I'm assuming should handle this configuration for me. I'm not understanding why it will work when run from inside of the distribution examples directory, but not outside of it.
You also need to define the jetty.home property. Try:
java -Dsolr.solr.home=/Users/jwwest/solr -jar $(brew --prefix solr)/libexec/example/start.jar -Djetty.home=$(brew --prefix solr)/libexec/example
You can see the effective command line start.jar generates by using the --dry-run command line flag.
java -jar start.jar --dry-run
That will output everything with full path names so you can run it from outside the directory.
Source: http://www.eclipse.org/jetty/documentation/9.0.0.M3/advanced-jetty-start.html
The start.jar is a jetty specific mechanism that works to build out all the classpath requirements for starting up Jetty. It is generally only used in the scope of the jetty distribution. Pulling the start.jar out of the configuration and placing it somewhere else renders the default configuration of the start.config rather moot.
My understanding of Solr is that it bundles itself with a distribution of jetty, placing what it needs to run into the distribution and repackages it as its own. They may have a custom start.config file that further adds its own locations for classpath resources and the like, or not.
The exception you are seeings stems from the start.config file expecting an etc/ directory containing jetty.xml formatted xml files which are used to configure the jetty process.
Jetty being often used in an embedded format has little to do with this issue, it is simply a common use case because jetty is incredibly easy to embed into an application. Embedded instances of jetty rarely (if ever) leverage a start.jar...instead it is up to the embedding application to manage its own classpath.
First, you need to change your folder where start.jar is located, then execute the same command.
Jetty is often used as embedded container. If you want to use the jetty, then a good start would be to copy the example directory and rename it to what you want it to be. The solr directory is the one for basic configuration.
Else it is recommended to use tomcat and the solr.war file.

How exactly does Tomcat run out of CATALINA_HOME and CATALINA_BASE

I'm having trouble finding documentation regarding this. After some googling I find that bin, conf,logs, temp, webapps, work are directories that should exist in CATALINA_BASE.
temp, logs, webapps, bin and work I don't have any trouble understanding.
bin I suppose is just another bin folder, if for some reason both CATALINA_HOME and CATALINA_BASE are in PATH, then scripts in both folders will be available for execution.
But how about conf? Will the content of CATALINA_HOME/conf be totally ignored if CATALINA_BASE is set? Suppose I only would need to customize only a few config files pr. CATALINA_BASE, would I still need to keep a complete set of config files in CATALINA_BASE/conf, or could the standard config files in CATALINA_HOME/conf be shared?
And ditto for CATALINA_BASE/lib ... would this work as a "global" lib folder pr. instance?
You can find the answer in the Tomcat documentation:
http://tomcat.apache.org/tomcat-6.0-doc/RUNNING.txt
Advanced Configuration - Multiple Tomcat Instances
In many circumstances, it is desirable to have a single copy of a
Tomcat binary distribution shared among multiple users on the same
server. To make this possible, you can set the $CATALINA_BASE
environment variable to the directory that contains the files for your
'personal' Tomcat instance.
When you use $CATALINA_BASE, Tomcat will calculate all relative
references for files in the following directories based on the value
of $CATALINA_BASE instead of $CATALINA_HOME:
bin - Only setenv.sh (*nix), setenv.bat (windows) and tomcat-juli.jar
conf - Server configuration files (including server.xml)
logs - Log and output files
webapps - Automatically loaded web applications
work - Temporary working directories for web applications
temp - Directory used by the JVM for temporary files (java.io.tmpdir)
Note that by default Tomcat will first try to load classes and JARs
from $CATALINA_BASE/lib and then $CATALINA_HOME/lib. You can place
instance specific JARs and classes (e.g. JDBC drivers) in
$CATALINA_BASE/lib whilst keeping the standard Tomcat JARs in
$CATALINA_HOME/lib.
If you do not set $CATALINA_BASE, $CATALINA_BASE will default to the
same value as $CATALINA_HOME, which means that the same directory is
used for all relative path resolutions.

Resources