adding document to collection failed in Solr cloud mode - solr

I need to add all documents in a folder to collection and it failed:
Here is my command:
hostname: mysolr
Solr Admin URL: http://mysolr.net:8983/solr/#/
Collection name: collection_indexer
Collection url: http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1
data folder:
/tmp/solr_data
Running folder:
bash-4.1$ pwd
/opt/cloudera/parcels/CDH/jars
command:
java -Dtype=application/json -Drecursive -Durl="http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1/update/json/docs" -jar post.jar /tmp/solr_data
Output:
bash-4.1$ java -Dtype=application/json -Drecursive
-Durl="http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1/update/json/docs"
-jar post.jar /tmp/solr_data SimplePostTool version 1.5 Posting files to base url
http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1/update/json/docs
using content-type application/json.. Entering recursive mode, max
depth=999, delay=0s Indexing directory /tmp/solr_data (1 files,
depth=0) POSTing file test.json SimplePostTool: WARNING: Solr returned
an error #405 (Method Not Allowed) for url:
http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1/update/json/docs
SimplePostTool: WARNING: Response: Apache
Tomcat/6.0.45 - Error report
525D76;}--> HTTP Status 405 - HTTP method POST is not supported by this URL
noshade="noshade">type Status reportmessage
HTTP method POST is not supported by this
URLdescription The specified HTTP method is not
allowed for the requested resource.Apache Tomcat/6.0.45
SimplePostTool: WARNING: IOException while reading response:
java.io.IOException: Server returned HTTP response code: 405 for URL:
http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1/update/json/docs
1 files indexed. COMMITting Solr index changes to
http://mysolr.net:8983/solr/#/collection_indexer_shard1_replica1/update/json/docs..
Time spent: 0:00:00.100
I also tried: http://mysolr.net:8983/solr/#/collection_indexer/update/json/docs as the Durl and got same error message.
Note the end of error message seems to give hint that the error pertains to the url or REST, can you please clarify what is missing here?
Thank you very much.
Update 20180415 7:07am EST:
Following MatsLindh's comment below, I made change to the command and ran again with the new command
java -Dtype=application/json -Drecursive -Durl="http://dsnyr001d01i1d.nam.nsroot.net:8983/solr/collection_indexer_shard1_replica1/update" -jar post.jar /tmp/solr_data
SimplePostTool version 1.5 Posting files to base url
http://mysolr.net:8983/solr/collection_indexer_shard1_replica1/update
using content-type application/json.. Entering recursive mode, max
depth=999, delay=0s Indexing directory /tmp/solr_data (1 files,
depth=0) POSTing file test.json SimplePostTool: WARNING: Solr returned
an error #400 (Bad Request) for url:
http://mysolr.net:8983/solr/collection_indexer_shard1_replica1/update
SimplePostTool: WARNING: Response:
{"responseHeader":{"status":400,"QTime":0},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"Unknown
command: region [9]","code":400}} SimplePostTool: WARNING: IOException
while reading response: java.io.IOException: Server returned HTTP
response code: 400 for URL:
http://mysolr.net:8983/solr/collection_indexer_shard1_replica1/update
1 files indexed. COMMITting Solr index changes to
http://mysolr.net:8983/solr/collection_indexer_shard1_replica1/update..
Time spent: 0:00:00.100
This time it is 400 error.
From Solr Admin I still do not see the new document is added.
Thank you.

Any part after a # in a standard HTTP URL is an anchor, and meant for consumption on the client (usually by scrolling to the element with the anchor as its id). These days it's used more for keeping state for browser applications (in particular before the history state API was introduced).
The important part is that anything behind # is never transmitted to the server - it's only used by the client to either scroll the page or used by javascript in the browser to handle state (in this case - which page you actually were looking at).
Since it's never transmitted to the server you end up making a request to http://mysolr.net:8983/solr/ - which probably isn't the URL you want to query.
Drop the anchor and use the actual collection update URL: http://mysolr.net:8983/solr/collection_indexer/update/json/docs should work.
You should not have to use the direct core URL (i.e. with the shard/replica parameters) if you're running in cloud mode.

Finally sorted it out with the below steps (exceptions showing in the log /var/log/solr):
1. Manually created new schema file specifically for the data format
2. Update the instance and the schema
solrctl instancedir --update
solrctl collection --reload
3. make sure the id for each doc is unique
Thanks lots to MatsLindh for the enlightening. It is much appreciated.

Related

Solr post command always fails with WARNING: Solr return an error 404 for url: http://localhost:8983/solr/core-name/update/extract... for HTML files

Linux Mint 20.1
Apache Solr 8.11.1
I am able to post XML documents from the examples subdirectory such as ipod_other.xml, just not a simple, well formed HTML file I have added to that subdirectory in order to test Solr because I anticipate indexing HTML documents. (Note that this is my first Solr rodeo.)
~/dev/solr-8.11.1/example/exampledocs $ ../../bin/post -c gettingstarted sample.html
/home/russ/dev/jdk-11.0.10+9/bin/java -classpath /home/russ/dev/solr-8.11.1/dist/solr-core-8.11.1.jar -Dauto=yes -Dc=gettingstarted -Ddata=files org.apache.solr.util.SimplePostTool sample.html
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/gettingstarted/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file sample.html (text/html) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/gettingstarted/update/extract?resource.name=%2Fhome%2Fruss%2Fdev%2Fsolr-8.11.1%2Fexample%2Fexampledocs%2Fsample.html&literal.id=%2Fhome%2Fruss%2Fdev%2Fsolr-8.11.1%2Fexample%2Fexampledocs%2Fsample.html
In server/solr/gettingstarted/conf/solrconfig.xml I have added:
<requestHandler name="/update" class="solr.UpdateRequestHandler" />
as suggested elsewhere, but this makes no difference in or out.
I am able to access http://localhost:8983/solr/#/ and http://localhost:8983/solr/#/gettingstarted/core-overview as well as run queries.

Error 404 for the url http://localhost:8081/jars/:jarid/run

I have jar file that already upload in flink cluster. I'm using flink 1.6.0
Here is the result after i uploaded the jar file
address "http://localhost:8081"
files
0
id "1d6dc437-bd5f-4147-a37e-b1d40d425a99_NicoWordCount.jar"
name "NicoWordCount.jar"
uploaded 1537174925000
entry
0
name "WordCount"
description null
When I run the following url
"http://localhost:8081/jars/1d6dc437-bd5f-4147-a37e-b1d40d425a99_NicoWordCount.jar/run"
it returns: Failure: 404 Not Found
When I run
"http://localhost:8081/jars/1d6dc437-bd5f-4147-a37e-b1d40d425a99_NicoWordCount.jar/plan"
it returns a result.
When I run NicoWordCount.jar in flink dashboard, it also run well and gives the expected result.
What am I doing wrong?
Which HTTP method do you use?
Run should be executed with POST. For more info on flink's API check this doc.

Querying Solr Config API returns Internal Server Error

I'm attempting to update my solr config via the Solr Config API. I attempting to first query the config with the following endpoint:
http://localhost:8983/solr//config
the response I get back is 500 Internal Server Error and I noticed in logs for the deployed solr the following exception:
Internal Server Error (500) - No RestManager found!
at org.apache.solr.rest.RestManager.getRestManager(RestManager.java:245)
at org.apache.solr.rest.SolrConfigRestApi.createInboundRoot(SolrConfigRestApi.java:67)
at org.restlet.Application.getInboundRoot(Application.java:270)
at org.restlet.engine.application.ApplicationHelper.start(ApplicationHelper.java:127)
at org.restlet.Application.start(Application.java:582)
The core was created using the following post:
curl "http://localhost:8983/solr/admin/cores?action=CREATE&name=<keyspace.table>&generateResources=true&reindex=true"
This action was successful but a GET to the config api fails.
The url should be: http://localhost:8983/solr/#/[ks.cf]/config
Still there are convenience/recommended methods to avoid you all the curl commands. See:
dsetool create_core
dsetool get_core_config
dsetool get_core_schema
dsetool reload_core
In your case a dsetool create_core, dsetool get_core_config, modify the solrconfig.xml and a later dsetool reload_core with the new solrconfig should get you what you need. See: Dsetool doc for usage details
If you need some examples on how to do that with curl, if you still need to use curl, see the shell files under the wikipedia demo, 1-add-schema.sh for instance. Still dsetool commands are the recommended method.
If you intended to use the Solr Config API please notice this is Solr 5 feature and in DSE you will find version 4.10 so that feature is not available yet. If you want to change the solr config you will have to reload with the new solrconfig using the outlined dsetool commands above.
It would be very useful to know the exact version you are using.
Hope it helps.

uploaded data is not visible in solr

I am using solr4.7. I have created a new core by copying "collection1(default example provided by solr)" to different name say "wiki" and updated core.properties with new name. Hence new core is visible at solr admin panel.
After starting solr, I am trying to import the data to new core like below.
$ java -jar post.jar ../../../enwiki-20150602-pages-articles1.xml -Durl='http://localhost:8983/solr/#/wiki/update'
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update using content-type application/xml..
POSTing file enwiki-20150602-pages-articles1.xml
SimplePostTool: WARNING: No files or directories matching -Durl=http:/localhost:8983/solr/#/wiki/update
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/update..
Time spent: 0:00:03.671
I also tried
$ java -jar post.jar ../../../enwiki-20150602-pages-articles1.xml
But still while querying at solr admin panel I am not getting any data.So my question is if data has been indexed then why I can't see it. Where exactly I am doing wrong.
Not sure this has been resolved.
I had the exact same problem. You need to specify the path/location of where your file to be ingested is located.
C:\test-solr>java -Durl=http://localhost:8983/solr/testdemo/update -Dtype=text/csv -jar C:/test-solr/exampledocs/post.jar "C:\test-solr\exampledocs\ingestMeFile.csv"

Need help understanding Solr

I'm just getting started with Nutch and Solr. I ran the crawl once with just one seed URL.
I ran this command:
bin/nutch crawl urls -dir crawl -solr http://localhost:8983/solr/ -depth 3 -topN 5
Everything goes fine and I'm assuming Solr indexes the pages? So how do I go about searching now? I went here localhost:8983/solr/admin/ but when I put a search query and click search I get this:
HTTP ERROR 400
Problem accessing /solr/select/.
Reason: undefined field text
I also tried an example from the tutorial but when I run this command:
java -jar post.jar solr.xml monitor.xml
I get this:
SimplePostTool: version 1.4
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file solr.xml
SimplePostTool: FATAL: Solr returned an error #400 ERROR: [doc=SOLR1000] unknown field 'name'
My ultimate goal is to somehow add this data into Accumulo and use it for a search engine.
I'm assuming you are using Nutch 1.4 or up. If that is the case, you need to change the type of the fields you added in the solr/conf/schema.xml file from "text" to "text_general", without the quotes.
I am working towards a similar goal right now and have used that fix to at least get solr working properly, although I still cannot get solr to search the indexed sites. Hope this helps, let me know if you get it working.

Resources