Solr 5: only first word gets highlighted - solr

Search for: "test string"
Highlighting I get: test string
This has been reported as a bug and is allegedly fixed:
Solr: Multi Word Synonyms : Only first word is highlighting
However, here's my version of Lucene:
<luceneMatchVersion>5.0.0</luceneMatchVersion>
How is it possible that I'm still getting this behaviour?
EDIT:
There are no special settings related to highlighting in my solrconfig.xml
Here is the query I use:
hl=true
&hl.simple.pre=<em>
&hl.simple.post=</em>
&hl.fl=Comments,Summary

My problem was that the application was incorrectly parsing tags returned by the highlighter - not Solr's fault at all!

Related

Solr - How to fix "Error adding field ... msg=For input string" when post data to core

I am new to Solr.
I created a Solr(8.1.0) core using SolrCloud for testing, and try to post data as a json file.
When an object has a value with float like "spalte412": "35.5" or with special characters, it throws an error in the in the console:
SimplePostTool: WARNING: Response: {
"responseHeader":{
"rf":2,
"status":400,
"QTime":223},
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","java.lang.NumberFormatException"],
"msg":"ERROR: [doc=52] Error adding field 'spalte421'='156.6' msg=For input string: \"156.6\"",
"code":400}}
I tried to edit core Schema by adding the field, in the Admin UI, without success.
Thanks for you help !
If you're not pre-defining your fields, the field types determined for the field will depend on the first document submitted that has that field present. Solr uses this field type to guess the type of the field, and in this case the guessed field type differs from the format you're sending in later documents.
The schemaless mode is neat for prototyping, but when moving to production you should always add the fields up front with the correct types so you don't suddenly get any surprises (as above) when the documents are submitted in a different order (or different documents) than when developing.
You can define fields in schema.xml or through the SchemaAPI.
You should post the schema.xml an an short description, what you did before.
"root-error-class","java.lang.NumberFormatException"
Sounds like solr war unable to understand that number format while your are trying to put a document with an stringt ( =For input string: \"156.6\"")
Sounds like you have a mismatch between a delivered and expected format.
Thanks guys.
indeed, I solved it by deleting the fields in the admin UI and defining with
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' http://localhost:8983/solr/films/schema

Error adding field 'field_name'-'field_value' msg=For input string: \"field_Value\"

We are struggling to import certain files into Solr occasionally. It seems like certain documents have weird meta data (values), not sure if it might be from eccentric word processor or something else. See two examples here:
Type: Solarium\Exception\HttpException
Message: Solr HTTP error: OK (400)
{"responseHeader":{"status":400,"QTime":49},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.NumberFormatException"],"msg":"ERROR: [doc=3932487729] Error adding field 'brightness_value'='6.18' msg=For input string: \"6.18\"","code":400}}
And
Type: Solarium\Exception\HttpException
Severity: error --> Exception: Solr HTTP error: OK (400)
{"responseHeader":{"status":400,"QTime":72},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.NumberFormatException"],"msg":"ERROR: [doc=16996] Error adding field 'version'='5.3.1' msg=For input string: \"5.3.1\"","code":400}}
How do we prevent these issues? We are not in control of the documents, so need to fix it on the server.
Define the field type explicitly in the schema instead of relying on Solr to create the field type for you - the first document that contains the field will make Solr guess the type of the field, and if later documents doesn't match the same, expected format, you'll get an error like this.
Always define the schema for a collection when using it in production or in an actual application - the schemaless mode is really neat for prototyping and experimenting, but in an actual application you want the types to be well defined.

After upgrading from Solr 3.5 to Solr 4.7 some queries return error

The following query was work fine in Solr 3.5:
http://localhost:6060/solr/newsarchive/select/?q=WebSite:www.shorouknews.com&sort=Date%20desc&version=2.2&start=&rows=10&indent=on&wt=json
However, it generates the following error with Solr 4.7. I tried to update <luceneMatchVersion>LUCENE_35</luceneMatchVersion>
and I set it to LUCENE_40 but the error is still exist. Does it an issue of the schema.xml? or Issue in the index? However, there are other simple queries works fine such as http://localhost:8983/solr/newsarchive4/select?q=%D9%85%D8%B5%D8%B1&wt=json&indent=true
{
"responseHeader":{
"status":500,
"QTime":35,
"params":{
"sort":"Date desc",
"indent":"on",
"start":"",
"q":"WebSite:www.shorouknews.com",
"wt":"json",
"rows":"10",
"version":"2.2"}},
"error":{
"msg":"For input string: \"\"",
"trace":"java.lang.NumberFormatException: For input string: \"\"\r\n\tat java.lang.NumberFormatException.forInputString(Unknown Source)\r\n\tat java.lang.Integer.parseInt(Unknown Source)\r\n\tat java.lang.Integer.parseInt(Unknown Source)\r\n\tat org.apache.solr.search.QParser.getSort(QParser.java:244)\r\n\tat org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:151)\r\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:196)\r\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\r\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:768)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:205)\r\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\r\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\r\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\r\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\r\n\tat org.eclipse.jetty.server.Server.handle(Server.java:368)\r\n\tat org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\r\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\r\n\tat org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\r\n\tat org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\r\n\tat org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\r\n\tat org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\r\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\r\n\tat org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\r\n\tat java.lang.Thread.run(Unknown Source)\r\n",
"code":500}}
EDIT
I noticed that start= in the query is not defined i.e &start=&, other versions of Solr regarded
it equals to 0 but 4.7 regarded it unknown. The question becomes, How
could I make Solr assign 0 to undefined start?
Well, it's not undefined, it's actually present in the URL (which is why you're getting the error - you're trying to set it to an empty string). You could try to supply it as an default to the SearchHandler, but I'm not sure if that will actually help, since as mentioned, the value is actually present. It's just empty.
<lst name="defaults">
<int name="start">0</int>
</lst>
You'd be better off fixing the reason why you're sending an empty start= parameter instead, or possibly, rewriting it in your container before the query reaches Solr. How you do that depends on which application container you're using.

defaultHighLight of solr3.6 does not work?

I am using solr 3.6.2 and I found the default highlight does not work like solr 1.4.1.
Is it a bug?
example:
I use 2gram tokenizer.
text: testabctest123456testabc
index: te es st ta ab bc ct te es st t1 12 23 ・・・
query: test
parameters:
hl=true
hl.fragsize=200
hl.simple.pre={{{
hl.simple.post=}}}
hl.highlightMultiTerm=true
hl.usePhraseHighlighter=true
At solr 3.6.2 the default highlight result is: {{{testabctest123456test}}}abc
At solr 1.4.1 the highlight result is: {{{test}}}abc{{{test}}}123456{{{test}}}abc
At solr 3.6.2 the FastVectorHighlighter resutlt is: {{{test}}}abc{{{test}}}123456{{{test}}}abc
What happened to the default highlight in solr 3.6?
Though the FastVectorHighlighter works well, I need to use default highlight.
No, it's not a bug. That's how the highlighter works. If it finds multiple matches within the same term, it wraps them all. There does not appear to be any sort of configuration to change this behavior, I don't believe.
Seems like using the Highlighter implementation that does what you want would be the logical approach.

Solr Adding PDF via ExtractRequestHandler

I am trying to use the Solr handler to add a PDF document to the index but keep getting a missing Unique field error (even though I am providing the field). Here is the request:
D:\Downloads\solr-4.6.0\solr-4.6.0\example\exampledocs>c:\temp\curl "http://loca
lhost:8983/solr/update/extract?commit=true&literal.MessageID=2b071dce-d7a6-4b7c-
9a09-33cc93f96db9" -F "myfile=#Wizards vs Warriors tickets.pdf"
The error I get back is:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">135</in
t></lst><lst name="error"><str name="msg">Document is missing mandatory uniqueKe
y field: MessageID</str><int name="code">400</int></lst>
</response>
In my REST call I am using literal.MessageID=... but it still seems to not find it.
Any ideas on how I can troubleshoot this? (NOTE: I did find an article on SO about problems with fields ending in ID so I removed the ID from the field in the schema and modified the literal.Message but still same issue.
Thanks,
I tried your test case and found the same result. Then I did several test cases and found that the required unique key field name should be always in lowercase. That's why when you try with MessageID, it doesn't work.
Try with messageid. It'll work.

Resources