nutch is not working properly - solr

Hello guys I am new in Nutch for web crawling.I followed the steps on
Nutch official site tutorial.
I typed the command in terminal
$ bin/crawl -i -D solr.server.url=http://localhost:8983/solr/ urls/ TestCrawl/ 2
where urls has seed file contains website name and TestCrawl is my db directory.
It's giving me error with exit value 255.I am not sure what error is this.So I am posting my terminal screen shot here.

did you build nutch using ant clean and ant runtime commands ?
If yes then rather than running nutch via nutch located in $NUTCH_HOME/bin/nutch use the from located inside $NUTCH_HOME/runtime/local/bin/nutch.
If not then first build run using ant runtime command.
HTH.

Related

Solr Error: Unable to create core [mycore] Caused by solr.ICUCollationField

I am trying to create a solr core, I am using drupalvm with vagrant and virtual box.
When setting up solr with this command:
sudo su - solr -c "/opt/solr/bin/solr create -c m4m -d /tmp/search_api_solr/solr-conf/7.x/"
I am getting this error:
INFO - 2018-11-05 19:21:45.804; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop
ERROR: Error CREATEing SolrCore 'mycore': Unable to create core [mycore] Caused by: solr.ICUCollationField
Creating a core without specifying the -d <confdir> option is successful but gives me some really weird errors in the solr dashboard and Drupal UI which research indicates has something to do with a corrupted core.
Any help with why I am getting this error would be much appreciated. Other developers using the same vagrant installation is running without issue.
If you create the core without the config directory, solr will use it's default configurations.
Which in turn, will have none of the drupal needed field definitions, and so forth.
What you need to do, if you know a little bit about the solr's structure, and if you use solr > version 7 is:
go to where your solr installation is
cd /PATH_TO_SOLR/server/solr-webapp/webapp/WEB-INF/lib
Copy all jars from the analysis-extras folder to your wEB-INF/lib folder
cp /PATH_TO_SOLR/contrib/analysis-extras/lib/*.jar ./
restart solr the way you normally do, specifying your -d config directory. That's important.
Hope this helps.
OR...
Save your hassle and let the pros handle all this for you with a SaaS such as the likes of https://opensolr.com
You can create your solr index with 1 click, and you need 2 more clicks to upload your config files and you're done.
I need jars from 2 directories:
cd /PATH_TO_SOLR
cp solr/contrib/analysis-extras/lib/*.jar solr/server/solr-webapp/webapp/WEB-INF/lib/
cp solr/contrib/analysis-extras/lucene-libs/*.jar solr/server/solr-webapp/webapp/WEB-INF/lib/
see solr/contrib/analysis-extras/README.txt

Could not find or load main class org.apache.solr.util.SolrCLI

I kept getting the error "Could not find or load main class org.apache.solr.util.SolrCLI" while trying to setup SOLR on windows x64 machine.
The resolution to the problem is really simple.
1)start solr. notice that you have to use solr.cmd instead of just solr
*bin/solr.cmd start*
2) then create the collection:
*bin/solr.cmd create -c gettingstarted -p 8983*
3) then add file/s to index using post tool. You can execute the post command in two ways
a)java -Dc=gettingstarted -jar post.jar *.json
or
b) bin/post -c gettingstarted example/exampledocs/books.json
Now you can navigate to your newly created collection 'gettingstarted' and query your books.
`http://localhost:8983/solr/#/gettingstarted/query'
Hope this saves someone's time.

Solr 5: Adding a new Core fails out of the box

So I'm just playing around with Solr 5, but I tried to add a new Core through the Admin UI and the command line with:
bin/solr create -c new_core
But in both situations I get the following error:
new_core: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core new_core: Error loading solr config from /Users/blah/lib/solr-5.3.0/server/solr/new_core/conf/solrconfig.xml
I started my Solr server using this:
bin/solr start
I'm following the docs here:
https://cwiki.apache.org/confluence/display/solr/Running+Solr
So what's the fix? How was this supposed to work out of the box given I assume there must be some template that the Admin UI uses when creating new cores?
The error is coming because there is no new_core configured in your setup.
Please perform the following steps:
mkdir /Users/blah/lib/solr-5.3.0/server/solr/core_name
echo "name=core_name" > /Users/blah/lib/solr-5.3.0/server/solr/core_name/core.properties
cp -r /Users/blah/lib/solr-5.3.0/server/solr/configsets/basic_configs/conf /Users/blah/lib/solr-5.3.0/server/solr/core_name/
Some important point to note:
core.properties and conf directory should be placed at same path.
conf directory will contain the schema.xml and solrconfig.xml files.

Solr Admin UI Error 500 #/Object Expected

I need help here. Used Solr and process 2 commands: start and create a core. But when I try to access the Admin UI, It pops up this error
500 Error get #/Object Expected
and I cannot see the dashboard. The commands I used is just 2 simple steps:
bin\solr start
bin\solr create -c Test
The cmd shows that the core is created successfully. Does anyone knows why?
I using solr 5.1.0 and java version 1.7. Thanks a lot.

Fail to load ExtractingRequestHandler when running the Solr Quickstart Tutorial

I installed Solr 5.0.0 on OS X 10.10.2 using Homebrew. I am trying to follow the quick start instructions and am getting errors when I try to index a directory of files.
I am able to successfully start the sample Solr server by running
bin/solr start -e cloud -noprompt
as directed by the tutorial. I then try to index a directory of files by running
./bin/post -c gettingstarted docs/
(Note that this has to be done from the libexec subdirectory of the Solr install root.)
I get a server error 500 for every file it tries to add. The relevant stack:
Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.extraction.ExtractingRequestHandler'
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:492)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:423)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:559)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:632)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.createRequestHandler(RequestHandlers.java:326)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:298)
... 30 more
The issue appears to be that ExtractingRequestHandler is not on the classpath.
ExtractingRequestHandler is in the solr-cell-5.0.0.jar.
jar tf dist/solr-cell-5.0.0.jar | grep ExtractingRequestHandler
org/apache/solr/handler/extraction/ExtractingRequestHandler.class
It's not clear to me if it needs to be on the classpath of the command doing the posting or the Solr instance. The answer to this question makes it sound like the latter. However, I tried setting
export CLASSPATH=dist/solr-cell-5.0.0.jar
before trying to index the files and saw the same error.
I don't see anything in the tutorial about how to configure this. What is the error and how do I get past it?
Looks like the problem is incorrect paths in the Solr example configuration. A workaround is to add softlinks from SOLR_ROOT/contrib and dist to the corresponding directories beneath SOLR_ROOT/libexec/contrib
Details here and here.

Resources