Solr/ SimplePostTool cannot index any file 503 error - solr

I am new to Solr and trying to figure out the basics of indexing one file. I've started with this tutorial http://lucene.apache.org/solr/quickstart.html, but being on windows I am hitting a wall when it comes to running the command to index.
This is what my output looks like:
SolrCloud example is running, please visit http://localhost:8983/solr"
C:\Projects\solr-5.1.0>java -Dc=gettingstarted -Dtype=text/csv -Dfiletypes=cs -
jar example/exampledocs/post.jar Z:/Indexer/tfs/ShippingOptionsPerRecipient.aspx
.cs
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/gettingstarted/update usi
ng content-type text/csv...
POSTing file ShippingOptionsPerRecipient.aspx.cs to [base]
SimplePostTool: WARNING: Solr returned an error #503 (Service Unavailable) for u
rl: http://localhost:8983/solr/gettingstarted/update
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">503</int><int name="QTime">4057</i
nt></lst><lst name="error"><str name="msg">No registered leader was found after
waiting for 4000ms , collection: gettingstarted slice: shard1</str><int name="co
de">503</int></lst>
</response>
SimplePostTool: WARNING: IOException while reading response: java.io.IOException
: Server returned HTTP response code: 503 for URL: http://localhost:8983/solr/ge
ttingstarted/update
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/updat
e...
Time spent: 0:00:04.255
Unfortunately, I can't find much documentation on any of these errors.
Any help would be appreciated.

Related

HTML sample file not indexing in Solr 8.8

I am trying out indexing the exampledocs in the examples folder with the SimplePostTool on windows 10 using solr 8.8. All the documents index except sample.html. For that file I get the following error:
PS C:\solr-8.8.0> java -jar -Dc=gettingstarted -Dauto example\exampledocs\post.jar example\exampledocs\sample.html
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/gettingstarted/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file sample.html (text/html) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/gettingstarted/update/extract?resource.name=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html&literal.id=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404 Not Found</h2>
<table>
<tr><th>URI:</th><td>/solr/gettingstarted/update/extract</td></tr>
<tr><th>STATUS:</th><td>404</td></tr>
<tr><th>MESSAGE:</th><td>Not Found</td></tr>
<tr><th>SERVLET:</th><td>default</td></tr>
</table>
</body>
</html>
SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/gettingstarted/update/extract?resource.name=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html&literal.id=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update...
Time spent: 0:00:00.086
However the json and all other file types index with no problem. For example:
PS C:\solr-8.8.0> java -jar -Dc=gettingstarted -Dauto example\exampledocs\post.jar example\exampledocs\books.json
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/gettingstarted/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file books.json (application/json) to [base]/json/docs
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update...
Just following this tutorial:https://lucene.apache.org/solr/guide/8_8/post-tool.html#post-tool-windows-support
The extracting request handler that allows indexing of rich documents has to be enabled before it can be used. If you look at the paths in both your request, you can see that your first request goes to /extract and it gives a 404, while your second request goes to /update and works.
You can find a description of how to enable and configure the endpoint in the Solr documentation:
If you are not working with an example configset, the jars required to use Solr Cell will not be loaded automatically. You will need to configure your solrconfig.xml to find the ExtractingRequestHandler and its dependencies:
<lib dir="${solr.install.dir:../../..}/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../..}/dist/" regex="solr-cell-\d.*\.jar" />
You can then configure the ExtractingRequestHandler in solrconfig.xml. The following is the default configuration found in Solr’s _default configset, which you can modify as needed:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>

Solr - Indexing error with UTF-8 characters

I am 100% new to Solr. I installed solr-5.1 for Windows and followed the tutorial.
I need some direction as to what may have caused the error below, e.g. need to add config to core xml file, UTF-8 encoding problem, etc...
start solr with :] solr.cmd -start
create a core :] solr create -c myExample
index pdf files :] jar -Dc=myexample -Dfiletypes=pdf -jar ../example/exampledocs/post.jar E:\solr_docs\*.pdf
Errors:
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/myExample/update using content-type application/xml...
POSTing file Intrusion detection by machine learning.pdf to [base]
SimplePostTool: WARNING: Solr returned an error \#400 (Bad Request) for url: http://localhost:8983/solr/myExample/update
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response><lst name="responseHeader">
<intname="status">400</int><intname="QTime">0</int>
</lst><lst name="error"><str name="msg">Invalid UTF-8 middle byte 0xe3 (at char
\#10, byte \#-1)</str><int name="code">400</int></lst>
</response>
You are feeding Solr a PDF file is if it were a text file. You need to configure and use a suitable URP chain to have Solr work with PDF files.

Error while indexing .xml files in solr

I am trying to index xml files in solr search engine using following command:
java -Durl=http://10.1.11.143:8080/solr/#/ -jar post.jar solr.xml
But I am getting following error:
SimplePostTool version 1.5
Posting files to base url http://10.1.11.143:8080/solr/#/ using content-type application/xml..
POSTing file solr.xml
SimplePostTool: WARNING: Solr returned an error #500 Internal Server Error
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 500 for URL: http://10.1.11.143:8080/solr/#/
1 files indexed.
COMMITting Solr index changes to http://10.1.11.143:8080/solr/#/..
SimplePostTool: WARNING: Solr returned an error #500 Internal Server Error for url http://10.1.11.143:8080/solr/#/?commit=true
Time spent: 0:00:00.017
Please help me to come out of this error.
Content of solr.xml is as shown in the picture:
The issue is because of the URL. You didn't mention any requestHandler while updating. Use the following command. It'll work.
java -Durl=http://10.1.11.143:8080/solr/update?commit=true -jar post.jar solr.xml
/update is the requestHandler to index data into Solr.

I'm following the Nutch tutorial, and getting a "No URLs to fetch" error

Following the Apache Nutch tutorial here:
As indicated in the tutorial, I've set the last line of my regex-urlfilter.txt to:
+^http://([a-z0-9]*\.)*nutch.apache.org/
My nutch-site.xml file contains only the lines
<property>
<name>http.agent.name</name>
<value>My Nutch Spider</value>
</property>
And my seed.txt file is:
http://nutch.apache.org/
However, when I crawl with
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
I get a "No URLs to fetch" error. Anyone know why?
Configuration looks fine to me. You have made these changes in runtime/local folder right?
seed.txt will be in NUTCH_HOME/runtime/local/urls folder and
regex-urlfilter.txt and nutch-site.xml will be in NUTCH_HOME/runtime/local/conf folder
NUTCH_HOME is installation directory

GAE - Unable to update: com.google.appengine.tools.admin.HttpIoException:

I tried to upload my test gwt app but I've just faced a quite strange error. Every time I try to upload app I get this:
Unable to update app: Error posting to URL:
https://appengine.google.com/api/appversion/create?app_id=gwttestapp001&version=1.0&
500 Internal Server Error
Server Error (500) A server error has occurred.
See the deployment console for more details Unable to update app:
Error posting to URL:
https://appengine.google.com/api/appversion/create?app_id=gwttestapp001&version=1.0&
500 Internal Server Error
Server Error (500) A server error has occurred.
... console says this
Skipping GWT compilation since no relevant changes have occurred since
the last deploy. Created staging directory at:
'C:\DOCUME~1\1\LOCALS~1\Temp\appcfg4973998929980348825.tmp' Scanning
for jsp files. Scanning files on local disk. Initiating update.
com.google.appengine.tools.admin.HttpIoException: Error posting to
URL:
https://appengine.google.com/api/appversion/create?app_id=gwttestapp001&version=1.0&
500 Internal Server Error
Server Error (500) A server error has occurred.
Debugging information may be found in C:\Documents and
Settings\1\Local Settings\Temp\appengine-deploy1308974562331110258.log
... and error log says this:
Unable to update: com.google.appengine.tools.admin.HttpIoException:
Error posting to URL:
https://appengine.google.com/api/appversion/create?app_id=gwttestapp001&version=1.0&
500 Internal Server Error
Server Error (500) A server error has occurred.
at
com.google.appengine.tools.admin.AbstractServerConnection.send1(AbstractServerConnection.java:281)
at
com.google.appengine.tools.admin.AbstractServerConnection.send(AbstractServerConnection.java:234)
at
com.google.appengine.tools.admin.AbstractServerConnection.post(AbstractServerConnection.java:213)
at
com.google.appengine.tools.admin.AppVersionUpload.send(AppVersionUpload.java:606)
at
com.google.appengine.tools.admin.AppVersionUpload.beginTransaction(AppVersionUpload.java:414)
at
com.google.appengine.tools.admin.AppVersionUpload.doUpload(AppVersionUpload.java:122)
at
com.google.appengine.tools.admin.AppAdminImpl.doUpdate(AppAdminImpl.java:328)
at
com.google.appengine.tools.admin.AppAdminImpl.update(AppAdminImpl.java:52)
at
com.google.appengine.eclipse.core.proxy.AppEngineBridgeImpl.deploy(AppEngineBridgeImpl.java:265)
at
com.google.appengine.eclipse.core.deploy.DeployProjectJob.runInWorkspace(DeployProjectJob.java:144)
at
org.eclipse.core.internal.resources.InternalWorkspaceJob.run(InternalWorkspaceJob.java:38)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
I just updated GAE from 1.5.2 to 1.6.2 but the error keeps going :(
How to fix it?
Well... I had to make my own research of this kind of problem deeper on...
So I guess I finally found the problem root :S The thing is in my app version syntax;
My appengine-web.xml file was containing
<?xml version="1.0" encoding="UTF-8"?>
<appengine-web-app xmlns="http://appengine.google.com/ns/1.0">
<application>gwttestapp001</application>
<version>1.0</version>
</appengine-web-app>
... but according to tutorial I found the thing is "there is no way to use dots" in version spelling so I changed the content to
<?xml version="1.0" encoding="UTF-8"?>
<appengine-web-app xmlns="http://appengine.google.com/ns/1.0">
<application>gwttestapp001</application>
<version>1</version>
</appengine-web-app>
... and all uploaded successfully :)
I hope that tip saves ones day
Solution to this can be like, as it worked for me:
properties starts
<appengine.app.version>10</appengine.app.version>
<appengine.target.version>1.8.7</appengine.target.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
properties ends
configuration starts
<version>${appengine.app.version}</version>
configuration ends
In configuration label, write version as that in properties label as above. Hope this will help.

Resources