Error while indexing .xml files in solr

Error while indexing .xml files in solr - solr

I am trying to index xml files in solr search engine using following command:
java -Durl=http://10.1.11.143:8080/solr/#/ -jar post.jar solr.xml
But I am getting following error:
SimplePostTool version 1.5
Posting files to base url http://10.1.11.143:8080/solr/#/ using content-type application/xml..
POSTing file solr.xml
SimplePostTool: WARNING: Solr returned an error #500 Internal Server Error
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 500 for URL: http://10.1.11.143:8080/solr/#/
1 files indexed.
COMMITting Solr index changes to http://10.1.11.143:8080/solr/#/..
SimplePostTool: WARNING: Solr returned an error #500 Internal Server Error for url http://10.1.11.143:8080/solr/#/?commit=true
Time spent: 0:00:00.017
Please help me to come out of this error.
Content of solr.xml is as shown in the picture:

The issue is because of the URL. You didn't mention any requestHandler while updating. Use the following command. It'll work.
java -Durl=http://10.1.11.143:8080/solr/update?commit=true -jar post.jar solr.xml
/update is the requestHandler to index data into Solr.

Related

Solr/ SimplePostTool cannot index any file 503 error

I am new to Solr and trying to figure out the basics of indexing one file. I've started with this tutorial http://lucene.apache.org/solr/quickstart.html, but being on windows I am hitting a wall when it comes to running the command to index.
This is what my output looks like:
SolrCloud example is running, please visit http://localhost:8983/solr"
C:\Projects\solr-5.1.0>java -Dc=gettingstarted -Dtype=text/csv -Dfiletypes=cs -
jar example/exampledocs/post.jar Z:/Indexer/tfs/ShippingOptionsPerRecipient.aspx
.cs
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/gettingstarted/update usi
ng content-type text/csv...
POSTing file ShippingOptionsPerRecipient.aspx.cs to [base]
SimplePostTool: WARNING: Solr returned an error #503 (Service Unavailable) for u
rl: http://localhost:8983/solr/gettingstarted/update
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">503</int><int name="QTime">4057</i
nt></lst><lst name="error"><str name="msg">No registered leader was found after
waiting for 4000ms , collection: gettingstarted slice: shard1</str><int name="co
de">503</int></lst>
</response>
SimplePostTool: WARNING: IOException while reading response: java.io.IOException
: Server returned HTTP response code: 503 for URL: http://localhost:8983/solr/ge
ttingstarted/update
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/updat
e...
Time spent: 0:00:04.255
Unfortunately, I can't find much documentation on any of these errors.
Any help would be appreciated.

Solr - Indexing error with UTF-8 characters

I am 100% new to Solr. I installed solr-5.1 for Windows and followed the tutorial.
I need some direction as to what may have caused the error below, e.g. need to add config to core xml file, UTF-8 encoding problem, etc...
start solr with :] solr.cmd -start
create a core :] solr create -c myExample
index pdf files :] jar -Dc=myexample -Dfiletypes=pdf -jar ../example/exampledocs/post.jar E:\solr_docs\*.pdf
Errors:
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/myExample/update using content-type application/xml...
POSTing file Intrusion detection by machine learning.pdf to [base]
SimplePostTool: WARNING: Solr returned an error \#400 (Bad Request) for url: http://localhost:8983/solr/myExample/update
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response><lst name="responseHeader">
<intname="status">400</int><intname="QTime">0</int>
</lst><lst name="error"><str name="msg">Invalid UTF-8 middle byte 0xe3 (at char
\#10, byte \#-1)</str><int name="code">400</int></lst>
</response>

You are feeding Solr a PDF file is if it were a text file. You need to configure and use a suitable URP chain to have Solr work with PDF files.

Integrating Solr with Nutch issue

I am following a tutorial from here. i have got solr and nutch installed separately and they are both working all fine. The problem comes when i have to integrate them. From the earlier posts on this site i learned that there could some issue with the schema files. As mentioned in the tut i copied the schema.xml of nutch to the schema.xml of solr and restarted the solr. solr stoped because of configuration issues. So i simply copied the contents of each file into the other along with the existing content. Now (and previously as well) i get this error:
Indexer: starting at 2014-08-05 11:10:21
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
Can someone suggest what should be done?
I am using apache-nutch-1.8 and solr-4.9.0
Here is how my hadoop.log file looks like:
2014-08-05 12:50:05,032 INFO crawl.Injector - Injector: starting at 2014-08-05 12:50:05
2014-08-05 12:50:05,033 INFO crawl.Injector - Injector: crawlDb: -dir/crawldb
2014-08-05 12:50:05,033 INFO crawl.Injector - Injector: urlDir: urls
.
.
.
.
.
2014-08-05 13:04:21,255 INFO solr.SolrIndexWriter - Indexing 1 documents
2014-08-05 13:04:21,286 WARN mapred.LocalJobRunner - job_local1310160376_0001
org.apache.solr.common.SolrException: Bad Request
Bad Request
request: http://my-solr-url:8983/solr/update?wt=javabin&version=2
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-08-05 13:04:21,544 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
2014-08-05 13:10:37,855 INFO crawl.Injector - Injector: starting at 2014-08-05 13:10:37
.
.
.

may be because of some versioning differences the tutorial suggested to copy the conf/schema.xml whereas in this particular version of solr, the file schema-solr4.xml was supposed to be copied followed by addition of : <field name="_version_" type="long" indexed="true" stored="true"/> in line no 351. Restart the solr by java -jar start.jar and it works all normal! Hope this helps someone!

Nutch 1.3 and Solr 4.4.0 integration Job failed

I am trying to crawl the web using nutch and I followed the documentation steps in the nutch's official web site (run the crawl successfully, copy the scheme-solr4.xml into solr directory). but when I run the
bin/nutch solrindex http://localhost:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*
I get the following error:
Indexer: starting at 2013-08-25 09:17:35
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)
I have to mention that the solr is running but I cannot browse http://localhost:8983/solr/admin (it redirects me to http://localhost:8983/solr/#).
On the other hand, when I stop the solr, I get the same error! Does anybody have any idea about what is wrong with my setting?
P.S. the url that I crawl is: http://localhost/NORC

Check your configuration against: Solr and Nutch
Nutch and Solr's schema files should be the same or you may encounter problems so make sure they match up

When I meet same problem in nutch, the solr's log appear a error message "unknown field host".
After modifying the schema.xml in solr, the nutch's error vanish.

You are missing the name of the core inside your command.
e.g.:
./bin/crawl -i -D solr.server.url=http://localhost:8983/solr/#/your_corname urls/ crawl 1

Error while indexing in solr data crawled by nutch

I have starting working with nutch and solr and I have a problem with integrating Solr with Nutch. I followed this tutorial: http://wiki.apache.org/nutch/NutchTutorial and after using:
bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5
nutch shows message:
java.io.IOException: Job failed!
and solr is showing:
SEVERE: org.apache.solr.common.SolrException: ERROR:
[doc=http://nutch.apache.org/] unknown field 'host'
I thought that the reason might be a missing 'host' field in the $SOLR_HOME/example/solr/conf/schema.xml but it is there.
I would be very grateful for your help.

Changing configuration at Nutch side does not effect the schema of Solr. You have to define that field at schema.xml of Solr.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Error while indexing .xml files in solr - solr

The issue is because of the URL. You didn't mention any requestHandler while updating. Use the following command. It'll work. java -Durl=http://10.1.11.143:8080/solr/update?commit=true -jar post.jar solr.xml /update is the requestHandler to index data into Solr.

Related

Solr/ SimplePostTool cannot index any file 503 error

Solr - Indexing error with UTF-8 characters

Integrating Solr with Nutch issue

Nutch 1.3 and Solr 4.4.0 integration Job failed

Error while indexing in solr data crawled by nutch

Categories

Resources