Trying, in vain so far, to make Nutch + Solr work. I'm having very hard time understanding how to go about this thing with nutch and solr. I have followed all the tutorials I could find on the internet, most of them for older versions, but I still could not make any of them work. At this moment I'm follwoing this guide
I have unpacked nutch 2.2.1, sorl 4.3.1, hbase 0.90.4 to directory on my xampp local server (none of the tutorials said where I should unpack them to, so I assumed that on local server).
I'm using Cygwin on windows 7. JAVA_HOME is pointing to /cygdrive/c/PROGRA~1/java/jdk1.8.0_05
I stuck at Configure HBase step. As the tutorial dictates I have configured /hbase-0.90.4/conf/hbase-site.xml as follows:
<property>
<name>hbase.rootdir</name>
<value>file:///C:/xampp/htdocs/trynutch/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>C:/xampp/htdocs/trynutch/zookeeper</value>
</property>
As per tutorial after this I should be able to run the following command:
$ ./trynutch/hbase/bin/start_hbase.sh
When I run it in cygwin terminal, it gives an error:
DM#comp ~
$ cd C:/xampp/htdocs/trynutch/hbase-0.90.4/bin
DM#comp /cygdrive/c/xampp/htdocs/trynutch/hbase-0.90.4/bin
$ start_hbase.sh
-bash: start_hbase.sh: command not found
I'd appreciate any information.
try with following command:
./start_hbase.sh
if its not runnable then try after making it runnable, to make runnable use following command:
chmod a+x start_hbase.sh
you just try sh start-hbase.sh from the hbase bin directory.
cd C:/xampp/htdocs/trynutch/hbase-0.90.4/bin
sh start-hbase.sh
Related
I need to create an information retrieval system using Solr.Please assist how to do that on Mac computer.
Some quick notes on how to install solr on a Mac.
download openjdk. I downloaded openjdk-14.0.1_osx-x64_bin.tar
set the PATH environment variable to include
“your_path/jdk-14.0.1.jdk/Contents/Home/bin”
download ant (to compile solr). apache-ant-1.9.15-bin.tar.bz2
add to the PATH again. “your_path/apache-ant-1.9.15/bin”
from solr README.md, “ant compile”
from solr README.md, “ant server”
chmod +x solr* from the solr/bin directory
bin/solr start
test http://localhost:8983/solr/
So, I have two instance of solr node running along with a embedded zookeeper on a single machine using the link Set up solrCloud. Now I want to add a new machine to this cluster. I run bin\solr start -cloud -s ./solr -h newMachineIP -p 9000 -z oldMachineIP:9983. It shows successful startup, but when I create a new collection it gives me an error saying "Server refused connection at: http://newMachineIp:9000/solr"
just a guess but... does C:\path\to\dir\solr-7.1.0\solr-7.1.0\server\solr\gettingstarted contain any spaces? If so, install Solr into a path with no spaces, this has been an issue before in Windows, and it's possible it still is in some code paths. Solr on Windows get much less testing than on linux.
I want to setup a cron in Amazon EC2 Linux to run a SOLR full-import at 12:15AM every night.
Before I setup the cron I tried testing in the terminal whether it is working or not. I used below command to test
/usr/bin/lynx http://amzon-instance-ip:8983/solr/work/dataimport?command=full-import
Output of the command:
[1] 15153
But when I go to below url to check whether the full-import actully initiated. I see the full-import command is not running.
http://amzon-instance-ip:8983/solr/#/workb/dataimport//dataimport
Anyone can help me why the SOLR full-import not running with lynx command? Am I using lynx correctly or do I need to use a differnt approach? Any Suggestions please.
I spent some time on internet searching the solution for why a url not working with lynx but could not find the solution.
Thanks for #Oyeme suggestion, I got two ways to get my URL running using linux curl and wget commands.
Using linux curl command:
curl -s ' http://amzon-instance-ip:8983/solr/work/dataimport?command=full-import&clean=false' > /dev/null
Using linux wget command:
wget -O /dev/null ' http://amzon-instance-ip:8983/solr/work/dataimport?command=full-import&clean=false'
i was installing postgresql on ubuntu using linuxbrew:
brew install postgresql
it seems to work fine but after that because i was installing PostgreSQL for the first time i tried creating a database:
initdb /usr/local/var/postgres -E utf8
but it returned as:
initdb: command not found
i tried running the command with sudo but that doesn't helped
run locate initdb it should give you the list to chose. smth like:
MacBook-Air:~ vao$ locate initdb
/usr/local/Cellar/postgresql/9.5.3/bin/initdb
/usr/local/Cellar/postgresql/9.5.3/share/doc/postgresql/html/app-initdb.html
/usr/local/Cellar/postgresql/9.5.3/share/man/man1/initdb.1
/usr/local/Cellar/postgresql/9.6.1/bin/initdb
/usr/local/Cellar/postgresql/9.6.1/share/doc/postgresql/html/app-initdb.html
/usr/local/Cellar/postgresql/9.6.1/share/man/man1/initdb.1
/usr/local/bin/initdb
/usr/local/share/man/man1/initdb.1
So in my case I want to run
/usr/local/Cellar/postgresql/9.6.1/bin/initdb
If you don't have mlocate installed, either install it or use
sudo find / -name initdb
There's a good answer to a similar question on SuperUser.
In short:
Postgres groups databases into "clusters", each of which is a named collection of databases sharing a configuration and data location, and running on a single server instance with its own TCP port.
If you only want a single instance of Postgres, the installation includes a cluster named "main", so you don't need to run initdb to create one.
If you do need multiple clusters, then the Postgres packages for Debian and Ubuntu provide a different command pg_createcluster to be used instead of initdb, with the latter not included in PATH so as to discourage end users from using it directly.
And if you're just trying to create a database, not a database cluster, use the createdb command instead.
I had the same problem and found the answer here.
Ubuntu path is
/usr/lib/postgresql/9.6/bin/initdb
Edit: Sorry, Ahmed asked about linuxbrew, I'm talking about Ubuntu.
I Hope this answer helps somebody.
I had a similar issue caused by the brew install postgresql not properly linking postgres. The solve for me was to run:
brew link --overwrite postgresql
you can add the PATH to run from any location
sudo nano ~/.profile
inside nano go to the end and add the following
# set PATH so it includes user's private bin if it exists
if [ -d "/usr/lib/postgresql/14/bin/" ] ; then
PATH="/usr/lib/postgresql/14/bin/:$PATH"
fi
and configure the alternative
sudo update-alternatives --install /usr/bin/initdb initdb /usr/lib/postgresql/14/bin/initdb 1
I'm trying to run zeppelin on Ubuntu14 w/ Hadoop 1.0.3 and Spark 1.4.0.
I've finished building the source code, and all of the package successfully finished building. But when I run the daemon, it fails and says that the Zeppelin process had died.
Any ideas where this is going wrong?
It says that it can't find the logs folder and the run folder, which are definitely there.
Joseph,
I suggest that you try to test your zeppelin package first.
mvn verify
or check if your zeppelin process is alive or not.
ps -aux | grep zeppelin
Try running zeppelin by sudo :
sudo bin/zeppelin-daemon.sh start
This works for me:
ps -ef | grep "zeppelin"
kill -9 pid
sudo bin/zeppelin-daemon.sh restart
It could be an error caused by JDK version, at least that was the case for me
try to update jdk and build it again.
Also, make sure you are building it using the correct command
mvn clean package -Pspark-1.4 -Dhadoop.version=2.2.0 -Phadoop-2.2 -DskipTests
If you are running zeppelin from a virtual machine make sure that you have enough RAM and CPU. I ran into your problem when I was using Virtual Box and the default settings. When I increased the cpu's to 2 and RAM to 4096 everything worked fine. This is because zeppelin runs spark by default and spark is very resource intensive locally and otherwise.
I had the same issue and tried the proposed answers but none worked for me. Here is what did work for me:
Download the binaries, then download the build requirements:
sudo apt install openjdk-8-jdk npm libfontconfig r-base-dev r-cran-evaluate
sudo apt install maven
Go to the Zeppelin directory and run:
sudo bin/zeppelin-daemon.sh start
Go to localhost:8080 in your browser.
I had the same issue right now.. so I checked environment compatibility with my cdh then I got java compatibility issue
I setup yum install java-1.8.0-openjdk then start all the services of Hadoop with spark
Then, I started zeppelin, I made the zeppelin folder on root so I used..
sudo zeppelin/bin/zeppelin-daemon.sh start
Or
zeppelin/bin/zeppelin-daemon.sh start
I took Kangrok Lee's suggestion and ran mvn verify on my system. It prompted me that I do not have JAVA_HOME set and JAVA_HOME must point to JDK and not JRE.
Following steps fixed for me:
Make sure you have JDK installed on the system where you are trying to run zeppelin
Make sure JAVA_HOME environment variable points to your JDK and not JRE
After above steps are ensured zeppelin-daemon.sh start / restart should work for you. No need to use sudo.