Hi I am creating a search engine for a website using nutch and solr but i am unable to execute bin/nutch(command) in my command prompt. can i execute it in command prompt or should i use cygwin?
I am using solr 3.6.2 and nutch 1.7, pls provide me solution as soon as possibe
Use cygwin, heres an excellent guide to set them up together:
http://amac4.blogspot.com/2013/07/setting-up-solr-with-apache-tomcat-be.html
Related
I installed both Apache Zeppelin and Hbase via home brew and they both worked on their own. I was able to use Hbase shell in command line and open Zeppelin. I tested Zeppelin with spark and it worked fine.
However, my problem is how do I configure Hbase interpreter? I tried to follow the tutorials given by Zeppelin and it didn't work. This is the error message I got
I tried to resolve this by resetting the interpreter in the interpreter menu like this. But none of that helped. Any help is appreciated.
UPDATE:
I was able to resolve the problem by adding the following dependencies to the Hbase interpreter on Apache-zeppelin:
/usr/local/hbase-1.2.0/lib/hbase-client-1.2.0.jar
/usr/local/hbase-1.2.0/lib/hbase-protocol-1.2.0.jar
/usr/local/hbase-1.2.0/lib/hbase-common-1.2.0.jar
Note: /usr/local/hbase-1.2.0 is the home directory of Hbase
Reference:
https://stochasticcoder.com/2018/02/12/adding-hbase-interpreter-to-zeppelin-hortonworks/ (Thanks to #Alan)
For complete guide on installing/configuring hbase interpreter on Apache-zeppelin, you can find it at my repo: https://github.com/bixingxie/hbase-zeppelin/blob/master/README.md
Hello guys I am new in Nutch for web crawling.I followed the steps on
Nutch official site tutorial.
I typed the command in terminal
$ bin/crawl -i -D solr.server.url=http://localhost:8983/solr/ urls/ TestCrawl/ 2
where urls has seed file contains website name and TestCrawl is my db directory.
It's giving me error with exit value 255.I am not sure what error is this.So I am posting my terminal screen shot here.
did you build nutch using ant clean and ant runtime commands ?
If yes then rather than running nutch via nutch located in $NUTCH_HOME/bin/nutch use the from located inside $NUTCH_HOME/runtime/local/bin/nutch.
If not then first build run using ant runtime command.
HTH.
Is there a way to add username/password parameters to the following solr update command?
...........jdk7x64\bin\java -jar -Durl=http://localhost:8983/solr/collection1/update post.jar test_data.xml
Or is there any other way to post files to a Solr which is password protected?
Can you try below command, Hope this will help.
jdk7x64\bin\java -jar -Durl=http://username:password#localhost:8983/solr/collection1/update post.jar test_data.xml
Adding the username:password in the url just before the hostname.
According to Solr Issue, it is available from Solr 4.8.
I'v just started to learn solr. From last 3 days I'm in trouble. I can not
index rich documents on solr 3.6 and 4.0. I am using windows7 64bit.
what i tried is as:
First I installed solr 3.6 with tomcat-jetty.using BitNami Apache
1.tried -Durl command what i got :
error #500 lazy loading error
2.Download curl for my window machine and tried curl i got: error #500 lazy loading error
3.copied a program from solr tutorial to upload a file using solrJ for
SolrJ in NetBeans IDE and tried a pdf files to indexed using
update/extract
then i got:
org.apache.solr.common.SolrException: Server at
"myServer:port/solr" returned non ok status:500, message:Internal
Server Error
4.changed solconfig.xml so removed startup=lazy from update/extract
request handler and got the same thing
I re-installed solr 3.6 again but can't succeed. 4.0 gives the same error.
Same problem with some other request handler also like /browse says
etc.
Should i switch to Linux?
Looks like the packager (Bitnami) did not include that library, even though they left Solr configured to use that library. You may ask them to resolve it. Or you can deploy it yourself.
Here's how to deploy Solr on Tomcat. Its equally easy to install on Windows; and it starts as a Windows service. Once installed, to enable the rich document support, copy the contents of contrib/extraction/lib/ to a directory and point the sharedLib in solr.xml to that directory. If you have used that guide, you will understand those new terms :-)
I have Linux Ubuntu 12.04 installed and I'm trying to install nutch 1.5.1 and solr 3.6.1 and integrate theme together to crawl seed urls.
I'm using This tutorial to get this work.
I followed the steps before 3.2 and skipped to step 4 and I can access to
localhost:8983/solr/admin/
without error.
but when going to step 6 and copying schema.xml from conf folder of nutch to example/solr/conf folder of solr
solr/admin page occurs a java error,below:
How can I handle that?
one more thing to ask....
I have another tutorial for this that looks good but in first step it mentions that add some code to nutch-site.xml file in /conf/ and /runtime/local/conf/ folder
but in nutch folder there is no runtime folder.In step 4 this folder mentioned too.
any suggestion?
thanks in advance
This is just bit of red herring. The line that specifies version number something like:
<schema name="nutch" version="1.5.1">
is causing it because the value of version is being parsed as float. remove the extra dot. Change it to 1.5 or 1.51 to make it valid float and restart your solr instance. The exception should disappear.
Check,please, whether are Nutch 1.5.1 and Solr 3.6.1 compatible (are they having same versions of lucene-core and solr-solrj jars). I got some problems with incompatible versions, but not with 1.5/3.6 .