Solr Arabic Search - solr

I want to implement Arabic Search in my solr, I am able to index the document but not able to search them. When i refer to the documents by ID I get the document, but not when I do a search by arabic words,
Search URL
http://122.166.9.144:8080/solr/tw/select/?q=تأجير الاهلي
Search Response
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">18</int>
<lst name="params">
<str name="q">تأجÙر اÙاÙÙÙ</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>
What could be the problem?
Thanks,
Rohit
Edit Request/Response Header
Response Headers view source
Server Apache-Coyote/1.1
Content-Type application/xml;charset=UTF-8
Transfer-Encoding chunked
Date Mon, 15 Aug 2011 15:37:25 GMT
Request Headers view source
Host 122.166.9.144:8080
User-Agent Mozilla/5.0 (Windows NT 6.0; rv:5.0) Gecko/20100101 Firefox/5.0
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip, deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection keep-alive

Apparently the server fails to decode the Arabic text in the URL using the right charset. It looks vaguely like it got UTF-8 but thought it was Latin-1. Have you tried wiresharking the conversation to see exactly which URL bytes get sent to the server?

Related

Winwdows solr post - Invalid UTF-8 middle byte 0xe3 (at char #10, byte #-1)

My code c2020 is running and available what I visit http://localhost:8983/solr/#/c2020/query.
Locally, when I try to run:
solr-7.7.2> java -jar -Dc=c2020 example\exampledocs\post.jar "C:\temp\path_to\a_doc.pdf"
I get:
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/c2020/update using content-type application/xml...
POSTing file A Half Century of Macro Momentum_vf.pdf to [base]
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/c2020/update
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">6</int>
</lst>
<lst name="error">
<lst name="metadata">
<str name="error-class">org.apache.solr.common.SolrException</str>
<str name="root-error-class">java.io.CharConversionException</str>
</lst>
<str name="msg">Invalid UTF-8 middle byte 0xe3 (at char #10, byte #-1)</str>
<int name="code">400</int>
</lst>
</response>
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/c2020/update
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/c2020/update...
Time spent: 0:00:00.310
Now, if I run:
java -Durl=http://localhost:8983/solr/c2020/update/extract -jar example\exampledocs\post.jar "C:\temp\path_to\a_doc.pdf"
It works:
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/c2020/update/extract using content-type application/xml...
POSTing file a_doc.pdf to [base]
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/c2020/update/extract...
Time spent: 0:00:14.647
But it does not send all "fields" I want to see such as the file path or the file name.
I'd like to get the raw post doc to work if anyone could advise.

Can Solr Index ASPX Files?

When I try to index ASPX files using Solr, I keep getting an error. My ASPX file is very standard. Does anyone know if Solr just isn't able to index ASPX files? If I change the extension from ASPX to HTML, Solr can index the file just fine which I find very odd.
The command I use is (im a Windows user):
java -Dc=collection -Drecursive -Dfiletypes=aspx jar example/exampledocs/post.jar c:/folder
I get the errors:
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/collection/update
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">1</int>
</lst>
<lst name="error">
<lst name="metadata">
<str name="error-class">org.apache.solr.common.SolrException</str>
<str name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException</str>
</lst>
<str name="msg">Unexpected character '=' (code 61); expected a semi-colon after the reference for entity 'l'
at [row,col {unknown-source}]: [8,43]</str>
<int name="code">400</int>
</lst>
</response>
In the Solr log:
org.apache.solr.common.SolrException: Unexpected character '=' (code 61); expected a semi-colon after the reference for entity 'l'

SolrCloud backup data not finding on my system path

When I ran the SolrCloud (6.3.0) Back API using this url backup url
It returns response status 0 i.e no error
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">213</int>
</lst>
<lst name="success">
<lst name="localhost:8983_solr">
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">43</int>
</lst>
</lst>
</lst>
</response>
But I didn't find any locations_back folder with backed up data on my system.
When I ran this curl from command line, it returns some numbers like 32058, 32059,32060 etc.
curl http://localhost:8983/solr/admin/collections?action=BACKUP&name=locations_back&collection=locations&location=/opt/solr/server/solr
[1] 32058
[2] 32059
[3] 32060
Where I tested with location param path location=/opt/solr/server/solr OR location =/
But It also didn't show any locations_back folder with backed up data on my system.
In order to troubleshoot this issue please check the following:
modify the backup folder from /opt/solr/server/solr to /tmp/ That will help you understanding if it's a permission problem;
Is your collection really called locations (as one of your params is collection=locations)? Change it as necessary
Check the Solr server logs in order to see what's the error;
Is SSH enabled or are you hitting a firewall?
Are your Solr nodes healthy?
Are you calling that url through a browser or programmatically?

Cannot get highlighted Solr response

I'm using Solr example server to do an investigation. After fed it with all cached documents, mostly html files, it works fine except the highlight part.
The request URL I'm using is as followed,
http://localhost:8983/solr/collection1/select?q=keyword&wt=xml&hl=true
And the XML response is as followed,
<response>
<lst name="responseHeader">...</lst>
<result name="response" numFound="371" start="0">
<doc>
<arr name="links">
<str>rect</str>
<str>FJU_KDJFJJ_DJ_13</str>
</arr>
<str name="id">
F:\SkyDrive\funproj\cache\adfadf\asdff.htm
</str>
<arr name="title">
<str>asdff.htm</str>
</arr>
<arr name="content_type">
<str>text/html; charset=ISO-8859-1</str>
</arr>
<str name="resourcename">
F:\SkyDrive\funproj\cache\adfadf\asdff.htm
</str>
<arr name="content">
<str>...</str>
</arr>
<long name="_version_">1418589758873927680</long>
</doc>
<doc>...</doc>
</result>
<lst name="highlighting">
<lst name="F:\SkyDrive\funproj\cache\adfadf\asdff.htm"/>
<lst name="F:\SkyDrive\funproj\cache\cvzcv\c58053e10vq.htm"/>
<lst name="F:\SkyDrive\funproj\cache\hgdfhdfgh\c00302e10vq.htm"/>
<lst name="F:\SkyDrive\funproj\cache\asdfasdf\c00945e10vq.htm"/>
<lst name="F:\SkyDrive\funproj\cache\hjmyukt\asfdf06113002_03312010.htm"/>
<lst name="F:\SkyDrive\funproj\cache\nmvbmnm\saf0q033111.htm"/>
<lst name="F:\SkyDrive\funproj\cache\lkiullkl\a10-5974_110q.htm"/>
<lst name="F:\SkyDrive\funproj\cache\jhlhjkl\fdfinal.htm"/>
<lst name="F:\SkyDrive\funproj\cache\vcbxcbvcx\zynex10q33110_5132010.htm"/>
<lst name="F:\SkyDrive\funproj\cache\yuiuiou\v185403_10q.htm"/>
</lst>
</response>
The response, no matter JSON or XML, does not have the highlight part at all. I've checked the solrconfig.xml both in local file system and the admin page of the example server. The Highlighting is default on and pre/post are set to ""/"". The example search portal itself works fine with highlight in its results. But since it's not AJAX, there's no way for me to check its result through chrome.
What did I do wrong?
You have to define the fields using hl.fl which needs to be highlighted. For example, if you want to search and highlight hits in content field, you can use query below:
http://localhost:8983/solr/collection1/select?q=content:keyword&wt=xml&hl=true&hl.q=content:keyword&hl.fl=content
By default highlighting response returns only one snippet,even if your field have multiple hits. Also the length of snippet(fragsize) is set to 100 chars by default.
You can use hl.snippets and hl.fragsize to modify them.
For example, to modify fragsize:
http://localhost:8983/solr/collection1/select?q=content:keyword&wt=xml&hl=true&hl.q=content:keyword&hl.fl=content&hl.fragsize=5000
Passing hl.fragsize=0 will make fragsize unlimited.
For changing number of snippets:
http://localhost:8983/solr/collection1/select?q=content:keyword&wt=xml&hl=true&hl.q=content:keyword&hl.fl=content&hl.snippets=10
Refer to solr wiki for more parameters.
you would need to add the field hl.fl on which the highlighting needs to be enabled.
Default value for the param is blank.

Solr 3.2 edismax

I am trying to use the edismax defType and am running into the following error.
HTTP ERROR : 400
Unknown query type 'edismax'
The request handler in the solrconfig.xml file looks as follows
<requestHandler name="foobar" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="qf">block</str>
<str name="q.alt">*:*</str>
</lst>
</requestHandler>
My goal is to do wildcard searches with this search handler.
We recently upgraded to use Solr 3.2 from 1.4. Is there a setting or a config that has to be change to allow edismax?
Thanks!
HTTP ERROR : 400 Unknown query type 'edismax'
Its indicating an invalid query type parameter which is qt and not defType.
Are you trying to use qt=edismax, if so, this might result into this error as the request handler is named foobar.
You can rename foobar to edismax or use qt=foobar

Resources