docker of solr and nutch working together? - solr

I'm trying to activate nutch and solr as dockers out of the box to consume that in REST.
Tried
same docker - started it with
docker run --name nutch_solr -d -p 8899:8899 -p 8983:8983 -t momer/docker-solr-nutch:4.6.1
can't surf to ..:8983, or ...:8899
Tried different dockers - nutch and solr
started it with:
docker run --name my_nutch -d -p 8899:8899 -e SOLRURL=192.168.99.100:8983 -t meabed/nutch
Solr is up and running, but can't really work with nutch (my question)
tried more another nutch docker - what am I missing? I'd like to write a post on how to raise solr and nutch dockers in 5 minutes, but I just can't seem to work with it..
so, does someone know how to activate it including send a sample of crawling job for work?

Related

How to start Solr Cloud collection without -e -cloud

I use Solr Cloud for my project and I always start it with bin/solr start -e -cloud that always prompt all the questions for making a new collection or re-use one.
But is there a command that can directly start the collection I want, like bin/solr start [collectionName] -cloud -p 8983?
I didn't find anything in the Solr Manual.
Thanks :)
All collections are available by default, so starting Solr in cloud mode should be enough.
bin/solr start -c
The default port is 8983, so there is no need to give the -p parameter unless you're changing it.

SOLR full-import not working when running using lynx command

I want to setup a cron in Amazon EC2 Linux to run a SOLR full-import at 12:15AM every night.
Before I setup the cron I tried testing in the terminal whether it is working or not. I used below command to test
/usr/bin/lynx http://amzon-instance-ip:8983/solr/work/dataimport?command=full-import
Output of the command:
[1] 15153
But when I go to below url to check whether the full-import actully initiated. I see the full-import command is not running.
http://amzon-instance-ip:8983/solr/#/workb/dataimport//dataimport
Anyone can help me why the SOLR full-import not running with lynx command? Am I using lynx correctly or do I need to use a differnt approach? Any Suggestions please.
I spent some time on internet searching the solution for why a url not working with lynx but could not find the solution.
Thanks for #Oyeme suggestion, I got two ways to get my URL running using linux curl and wget commands.
Using linux curl command:
curl -s ' http://amzon-instance-ip:8983/solr/work/dataimport?command=full-import&clean=false' > /dev/null
Using linux wget command:
wget -O /dev/null ' http://amzon-instance-ip:8983/solr/work/dataimport?command=full-import&clean=false'

StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000 This site can’t be reached

I'm tring to run the Stanford CoreNLP server. I'm using the Docker files on this official Stanford list
http://stanfordnlp.github.io/CoreNLP/other-languages.html#docker
My OS is ubuntu16-4lts. I don't know much about ubuntu, coding, servers, or NLP.
I tried the first one on the list https://hub.docker.com/r/motiz88/corenlp/ I ran it as is and got this far:
steve at ubuntu16-4lts:~$ docker run --name coreNPL --rm -i -t motiz88/corenlp
-- listing properties --
Starting server on port 9000 with timeout of 5000 milliseconds.
StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
But when I goto http://localhost:9000/ I get:
This site can’t be reached
localhost refused to connect.
The second one on the list got similar results.
https://github.com/chilland/corenlp-docker
Is there something else I'm supposed to do or configure? Is the the Stanford CoreNLP server a HTTP server in it's own right, will it serve the link to the localhost:9000 by itself, or does it require the help of an Apache HTTP Server?
I've searched stack exchange for "[stanford-nlp] /0:0:0:0:0:0:0:0:9000" but could not find one that compares to my situation.
edu.stanford.nlp.io.RuntimeIOException: Could not connect to server
StanfordCoreNLP differs from StanfordCoreNLPServer
The container's port 9000 has to be published to the host. So, the command would be
docker run -p 9000:9000 --name coreNPL --rm -i -t motiz88/corenlp

Solr insatnces/nodes in solr Cloud not linking

I am trying to setup Solr cloud with multiple instance(s), all on one machine my zookeeper ensemble is runing on local machine.
for first instance
bin/solr start -c -s $INSATNCE_DIR -p $solrport -z localhost:2181,localhost:2182,localhost:2183 -Dsolr.log=/logs -Dsolr.install.dir=/solr-5.4.0 -V -Dbootstrap_conf=true
and for other instance
bin/solr start -c -s $INSATNCE_DIR -p $solrport -z localhost:2181,localhost:2182,localhost:2183 -Dsolr.log=/logs -Dsolr.install.dir=/solr-5.4.0 -V -Dbootstrap_conf=false
all works fine, but each of these instance admin interface under cloud>Tree only lists one live node
localhost:8983_solr
not the other instances.
This is causing issue when i am creating collection to be distributed to nodeSet.
Anyone has any idea whats am i doing wrong?

How does solr loads configsets in examples provided by solr?

I have started learning solr.I have downloaded the latest zip(5.1.0) provided by solr and run the server using bin/solr start -e cloud -noprompt.
I check that this internally calls
bin/solr start -cloud -s example/cloud/node1/solr -p 8983
bin/solr start -cloud -s example/cloud/node2/solr -p 7574 -z localhost:9983
I check that these is no config(conf/solrconfig.xml) defined in example/cloud/node1/solr so how does solr load config from the SOLR_HOME/configsets directory?
I read the documentation on several places but i am still unable to figure out the use of cloud like in 'bin/solr start -cloud -s ... ' and use of zookeeper.
Please help.
when you are working on solr cloud with zookeeper, you have to upload your solr config on zookeeper.
./bin/solr zk -upconfig -z localhost:2181,localhost:2182,localhost:2182 -n my-config -d server/solr/files/conf/
using upconfig you can upload your solr config, only have to provide path of your config directory.
You can use config name(my-config) for create core using api.
http://XXX.XXX.XXX.XXX:8983/solr/admin/collections?action=CREATE&name=irTest&numShards=3&replicationFactor=2&maxShardsPerNode=3&collection.configName=my-config
So it will create core using your config only.
Download the latest version of the Apache solr reference guide.
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-5.1.pdf
check this section in the PDF.
Configuration Directories and SolrCloud
Since you are not specifying a specific configset, the default is loaded.
First, if you don't provide the -d or -n options, then the default
configuration ($SOLR_HOME/server/solr/con
figsets/data_driven_schema_configs/conf) is uploaded to ZooKeeper
using the same name as the collection

Resources