Solr Adding PDF via ExtractRequestHandler - solr

I am trying to use the Solr handler to add a PDF document to the index but keep getting a missing Unique field error (even though I am providing the field). Here is the request:
D:\Downloads\solr-4.6.0\solr-4.6.0\example\exampledocs>c:\temp\curl "http://loca
lhost:8983/solr/update/extract?commit=true&literal.MessageID=2b071dce-d7a6-4b7c-
9a09-33cc93f96db9" -F "myfile=#Wizards vs Warriors tickets.pdf"
The error I get back is:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">135</in
t></lst><lst name="error"><str name="msg">Document is missing mandatory uniqueKe
y field: MessageID</str><int name="code">400</int></lst>
</response>
In my REST call I am using literal.MessageID=... but it still seems to not find it.
Any ideas on how I can troubleshoot this? (NOTE: I did find an article on SO about problems with fields ending in ID so I removed the ID from the field in the schema and modified the literal.Message but still same issue.
Thanks,

I tried your test case and found the same result. Then I did several test cases and found that the required unique key field name should be always in lowercase. That's why when you try with MessageID, it doesn't work.
Try with messageid. It'll work.

Related

Solr - How to fix "Error adding field ... msg=For input string" when post data to core

I am new to Solr.
I created a Solr(8.1.0) core using SolrCloud for testing, and try to post data as a json file.
When an object has a value with float like "spalte412": "35.5" or with special characters, it throws an error in the in the console:
SimplePostTool: WARNING: Response: {
"responseHeader":{
"rf":2,
"status":400,
"QTime":223},
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","java.lang.NumberFormatException"],
"msg":"ERROR: [doc=52] Error adding field 'spalte421'='156.6' msg=For input string: \"156.6\"",
"code":400}}
I tried to edit core Schema by adding the field, in the Admin UI, without success.
Thanks for you help !
If you're not pre-defining your fields, the field types determined for the field will depend on the first document submitted that has that field present. Solr uses this field type to guess the type of the field, and in this case the guessed field type differs from the format you're sending in later documents.
The schemaless mode is neat for prototyping, but when moving to production you should always add the fields up front with the correct types so you don't suddenly get any surprises (as above) when the documents are submitted in a different order (or different documents) than when developing.
You can define fields in schema.xml or through the SchemaAPI.
You should post the schema.xml an an short description, what you did before.
"root-error-class","java.lang.NumberFormatException"
Sounds like solr war unable to understand that number format while your are trying to put a document with an stringt ( =For input string: \"156.6\"")
Sounds like you have a mismatch between a delivered and expected format.
Thanks guys.
indeed, I solved it by deleting the fields in the admin UI and defining with
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' http://localhost:8983/solr/films/schema

Solr HTTP Api - response status

When using SOLR via his HTTP Api, he responds with an object called responseHeader where he puts the status of the response
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">13</int>
<!-- ... -->
</lst>
<!-- ... -->
</response>
My first question: What are the possible values of the status attribute? and what are their meanings?. I know that the value 0 means a successful call.
My second question: Normally, Solr responds with an 200 code for a successful call. Can we get a status different than 0 (in case of an error) and still get 200 for the http response code?
I use SOLR 4.6
Nb: I asked those questions cause my SOLR doesn't index somme of the collections although he doesn't rise any error and the response code is 200!
Thank you
The status code usually corresponds to the HTTP code given back, except for 200 being returned as 0. This is implemented as the numeric code when a SolrException is thrown interally in Solr. When this was asked on the solr-user mailing list a while back, the following answer from Erik Hatcher was given:
Is there a reference to this status-codes?
Just the source code. SolrCore#setResponseHeaderValues, which predominately uses the codes specified in SolrException:
BAD_REQUEST( 400 ),
UNAUTHORIZED( 401 ), // not currently used
FORBIDDEN( 403 ),
NOT_FOUND( 404 ),
SERVER_ERROR( 500 ),
SERVICE_UNAVAILABLE( 503 ),
UNKNOWN(0);
The current version of SolrCore can be seen on Github.
You can probably assume that the http error code will match the status value most of the time, but there's certainly exceptions - for example if the HTTP request never reaches Solr at all, and the error is just returned by jetty instead (if the installations is severely borked, or possibly if you try to access something other than /solr).
In your example, the request can be perfectly fine and OK, even if the action doesn't trigger anything when importing data to a collection. If your import query ends up in 0 documents, that isn't an error - and if the import started (an import is usually a async operation), that's still a valid and OK request, even if the end result later isn't valid.
It's probably better to try to understand why your import is failing (and use a separate question for that with all the relevant details about how you're trying to index, what any errors in the log say (you can adjust the log level under Logging in the web interface) and what you expected the result to be), instead of looking at the status field in the response.

Error adding field 'field_name'-'field_value' msg=For input string: \"field_Value\"

We are struggling to import certain files into Solr occasionally. It seems like certain documents have weird meta data (values), not sure if it might be from eccentric word processor or something else. See two examples here:
Type: Solarium\Exception\HttpException
Message: Solr HTTP error: OK (400)
{"responseHeader":{"status":400,"QTime":49},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.NumberFormatException"],"msg":"ERROR: [doc=3932487729] Error adding field 'brightness_value'='6.18' msg=For input string: \"6.18\"","code":400}}
And
Type: Solarium\Exception\HttpException
Severity: error --> Exception: Solr HTTP error: OK (400)
{"responseHeader":{"status":400,"QTime":72},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.NumberFormatException"],"msg":"ERROR: [doc=16996] Error adding field 'version'='5.3.1' msg=For input string: \"5.3.1\"","code":400}}
How do we prevent these issues? We are not in control of the documents, so need to fix it on the server.
Define the field type explicitly in the schema instead of relying on Solr to create the field type for you - the first document that contains the field will make Solr guess the type of the field, and if later documents doesn't match the same, expected format, you'll get an error like this.
Always define the schema for a collection when using it in production or in an actual application - the schemaless mode is really neat for prototyping and experimenting, but in an actual application you want the types to be well defined.

Solr: where to find the Luke request handler

I'm trying to get a list of all the fields, both static and dynamic, in my Solr index. Another SO answer suggested using the Luke Request Handler for this.
It suggests finding the handler at this url:
http://solr:8983/solr/admin/luke?numTerms=0
When I try this url on my server, however, I get a 404 error.
The admin page for my core is here http://solr:8983/solr/#/mycore, so I also tried http://solr:8983/solr/#/mycore/admin/luke. This also gave me another 404.
Does anyone know what I'm doing wrong? Which url should I be using?
First of all you have to enable the Luke Request Handler. Note that if you started from the example solrconfig.xml you probably don't need to enable it explicitly because
<requestHandler name="/admin/" class="solr.admin.AdminHandlers" />
does it for you.
Then if you need to access the data programmatically you have to make an HTTP GET request to http://solr:8983/solr/mycore/admin/luke (no hash mark!). The response is in XML but specifying wt parameter you can obtain different formats (e.g. http://solr:8983/solr/mycore/admin/luke?wt=json)
If you only want to see fields in SOLR web interface select your core from the drop down menu and then click on "Schema Browser"
In Solr 6, the solr.admin.AdminHandlers has been removed. If your solrconfig.xml has the line <requestHandler name="/admin/" class="solr.admin.AdminHandlers" />, it will fail to load. You will see errors in the log telling you it failed to load the class org.apache.solr.handler.admin.AdminHandlers.
You must include in your solrconfig.xml the line,
<requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" />
but the URL is core-specific, i.e. http://your_server.com:8983/solr/your_core_name/admin/luke
And you can specify the parameters fl,numTerms,id,docId as follows:
/admin/luke
/admin/luke?fl=cat
/admin/luke?fl=id&numTerms=50
/admin/luke?id=SOLR1000
/admin/luke?docId=2
You can use this Luke tool which allows you to explore Lucene index.
You can also use the solr admin page :
http://localhost:8983/solr/#/core/schema-browser

After upgrading from Solr 3.5 to Solr 4.7 some queries return error

The following query was work fine in Solr 3.5:
http://localhost:6060/solr/newsarchive/select/?q=WebSite:www.shorouknews.com&sort=Date%20desc&version=2.2&start=&rows=10&indent=on&wt=json
However, it generates the following error with Solr 4.7. I tried to update <luceneMatchVersion>LUCENE_35</luceneMatchVersion>
and I set it to LUCENE_40 but the error is still exist. Does it an issue of the schema.xml? or Issue in the index? However, there are other simple queries works fine such as http://localhost:8983/solr/newsarchive4/select?q=%D9%85%D8%B5%D8%B1&wt=json&indent=true
{
"responseHeader":{
"status":500,
"QTime":35,
"params":{
"sort":"Date desc",
"indent":"on",
"start":"",
"q":"WebSite:www.shorouknews.com",
"wt":"json",
"rows":"10",
"version":"2.2"}},
"error":{
"msg":"For input string: \"\"",
"trace":"java.lang.NumberFormatException: For input string: \"\"\r\n\tat java.lang.NumberFormatException.forInputString(Unknown Source)\r\n\tat java.lang.Integer.parseInt(Unknown Source)\r\n\tat java.lang.Integer.parseInt(Unknown Source)\r\n\tat org.apache.solr.search.QParser.getSort(QParser.java:244)\r\n\tat org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:151)\r\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:196)\r\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\r\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:768)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:205)\r\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\r\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\r\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\r\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\r\n\tat org.eclipse.jetty.server.Server.handle(Server.java:368)\r\n\tat org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\r\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\r\n\tat org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\r\n\tat org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\r\n\tat org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\r\n\tat org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\r\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\r\n\tat org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\r\n\tat java.lang.Thread.run(Unknown Source)\r\n",
"code":500}}
EDIT
I noticed that start= in the query is not defined i.e &start=&, other versions of Solr regarded
it equals to 0 but 4.7 regarded it unknown. The question becomes, How
could I make Solr assign 0 to undefined start?
Well, it's not undefined, it's actually present in the URL (which is why you're getting the error - you're trying to set it to an empty string). You could try to supply it as an default to the SearchHandler, but I'm not sure if that will actually help, since as mentioned, the value is actually present. It's just empty.
<lst name="defaults">
<int name="start">0</int>
</lst>
You'd be better off fixing the reason why you're sending an empty start= parameter instead, or possibly, rewriting it in your container before the query reaches Solr. How you do that depends on which application container you're using.

Resources