Error while indexing .xml files in solr - solr

I am trying to index xml files in solr search engine using following command:
java -Durl=http://10.1.11.143:8080/solr/#/ -jar post.jar solr.xml
But I am getting following error:
SimplePostTool version 1.5
Posting files to base url http://10.1.11.143:8080/solr/#/ using content-type application/xml..
POSTing file solr.xml
SimplePostTool: WARNING: Solr returned an error #500 Internal Server Error
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 500 for URL: http://10.1.11.143:8080/solr/#/
1 files indexed.
COMMITting Solr index changes to http://10.1.11.143:8080/solr/#/..
SimplePostTool: WARNING: Solr returned an error #500 Internal Server Error for url http://10.1.11.143:8080/solr/#/?commit=true
Time spent: 0:00:00.017
Please help me to come out of this error.
Content of solr.xml is as shown in the picture:

The issue is because of the URL. You didn't mention any requestHandler while updating. Use the following command. It'll work.
java -Durl=http://10.1.11.143:8080/solr/update?commit=true -jar post.jar solr.xml
/update is the requestHandler to index data into Solr.

Related

Solr/ SimplePostTool cannot index any file 503 error

I am new to Solr and trying to figure out the basics of indexing one file. I've started with this tutorial http://lucene.apache.org/solr/quickstart.html, but being on windows I am hitting a wall when it comes to running the command to index.
This is what my output looks like:
SolrCloud example is running, please visit http://localhost:8983/solr"
C:\Projects\solr-5.1.0>java -Dc=gettingstarted -Dtype=text/csv -Dfiletypes=cs -
jar example/exampledocs/post.jar Z:/Indexer/tfs/ShippingOptionsPerRecipient.aspx
.cs
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/gettingstarted/update usi
ng content-type text/csv...
POSTing file ShippingOptionsPerRecipient.aspx.cs to [base]
SimplePostTool: WARNING: Solr returned an error #503 (Service Unavailable) for u
rl: http://localhost:8983/solr/gettingstarted/update
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">503</int><int name="QTime">4057</i
nt></lst><lst name="error"><str name="msg">No registered leader was found after
waiting for 4000ms , collection: gettingstarted slice: shard1</str><int name="co
de">503</int></lst>
</response>
SimplePostTool: WARNING: IOException while reading response: java.io.IOException
: Server returned HTTP response code: 503 for URL: http://localhost:8983/solr/ge
ttingstarted/update
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/updat
e...
Time spent: 0:00:04.255
Unfortunately, I can't find much documentation on any of these errors.
Any help would be appreciated.

Solr - Indexing error with UTF-8 characters

I am 100% new to Solr. I installed solr-5.1 for Windows and followed the tutorial.
I need some direction as to what may have caused the error below, e.g. need to add config to core xml file, UTF-8 encoding problem, etc...
start solr with :] solr.cmd -start
create a core :] solr create -c myExample
index pdf files :] jar -Dc=myexample -Dfiletypes=pdf -jar ../example/exampledocs/post.jar E:\solr_docs\*.pdf
Errors:
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/myExample/update using content-type application/xml...
POSTing file Intrusion detection by machine learning.pdf to [base]
SimplePostTool: WARNING: Solr returned an error \#400 (Bad Request) for url: http://localhost:8983/solr/myExample/update
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response><lst name="responseHeader">
<intname="status">400</int><intname="QTime">0</int>
</lst><lst name="error"><str name="msg">Invalid UTF-8 middle byte 0xe3 (at char
\#10, byte \#-1)</str><int name="code">400</int></lst>
</response>
You are feeding Solr a PDF file is if it were a text file. You need to configure and use a suitable URP chain to have Solr work with PDF files.

Integrating Solr with Nutch issue

I am following a tutorial from here. i have got solr and nutch installed separately and they are both working all fine. The problem comes when i have to integrate them. From the earlier posts on this site i learned that there could some issue with the schema files. As mentioned in the tut i copied the schema.xml of nutch to the schema.xml of solr and restarted the solr. solr stoped because of configuration issues. So i simply copied the contents of each file into the other along with the existing content. Now (and previously as well) i get this error:
Indexer: starting at 2014-08-05 11:10:21
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
Can someone suggest what should be done?
I am using apache-nutch-1.8 and solr-4.9.0
Here is how my hadoop.log file looks like:
2014-08-05 12:50:05,032 INFO crawl.Injector - Injector: starting at 2014-08-05 12:50:05
2014-08-05 12:50:05,033 INFO crawl.Injector - Injector: crawlDb: -dir/crawldb
2014-08-05 12:50:05,033 INFO crawl.Injector - Injector: urlDir: urls
.
.
.
.
.
2014-08-05 13:04:21,255 INFO solr.SolrIndexWriter - Indexing 1 documents
2014-08-05 13:04:21,286 WARN mapred.LocalJobRunner - job_local1310160376_0001
org.apache.solr.common.SolrException: Bad Request
Bad Request
request: http://my-solr-url:8983/solr/update?wt=javabin&version=2
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-08-05 13:04:21,544 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
2014-08-05 13:10:37,855 INFO crawl.Injector - Injector: starting at 2014-08-05 13:10:37
.
.
.
may be because of some versioning differences the tutorial suggested to copy the conf/schema.xml whereas in this particular version of solr, the file schema-solr4.xml was supposed to be copied followed by addition of : <field name="_version_" type="long" indexed="true" stored="true"/> in line no 351. Restart the solr by java -jar start.jar and it works all normal! Hope this helps someone!

Nutch 1.3 and Solr 4.4.0 integration Job failed

I am trying to crawl the web using nutch and I followed the documentation steps in the nutch's official web site (run the crawl successfully, copy the scheme-solr4.xml into solr directory). but when I run the
bin/nutch solrindex http://localhost:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*
I get the following error:
Indexer: starting at 2013-08-25 09:17:35
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)
I have to mention that the solr is running but I cannot browse http://localhost:8983/solr/admin (it redirects me to http://localhost:8983/solr/#).
On the other hand, when I stop the solr, I get the same error! Does anybody have any idea about what is wrong with my setting?
P.S. the url that I crawl is: http://localhost/NORC
Check your configuration against: Solr and Nutch
Nutch and Solr's schema files should be the same or you may encounter problems so make sure they match up
When I meet same problem in nutch, the solr's log appear a error message "unknown field host".
After modifying the schema.xml in solr, the nutch's error vanish.
You are missing the name of the core inside your command.
e.g.:
./bin/crawl -i -D solr.server.url=http://localhost:8983/solr/#/your_corname urls/ crawl 1

Error while indexing in solr data crawled by nutch

I have starting working with nutch and solr and I have a problem with integrating Solr with Nutch. I followed this tutorial: http://wiki.apache.org/nutch/NutchTutorial and after using:
bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5
nutch shows message:
java.io.IOException: Job failed!
and solr is showing:
SEVERE: org.apache.solr.common.SolrException: ERROR:
[doc=http://nutch.apache.org/] unknown field 'host'
I thought that the reason might be a missing 'host' field in the $SOLR_HOME/example/solr/conf/schema.xml but it is there.
I would be very grateful for your help.
Changing configuration at Nutch side does not effect the schema of Solr. You have to define that field at schema.xml of Solr.

Resources