Solr not returning highlighted results - solr

I am using nutch 1.15 and solr 7.3, and I followed search highlight as per doc - https://lucene.apache.org/solr/guide/7_3/highlighting.html
For me, normal query for nutch solr search is working and it is returning results:
curl http://localhost:8983/solr/nutch/select?q=content:build&wt=json&rows=10000&start=0
With search highlight query I am getting same results but getting a warning.- hl.q=content:build: not found
The query with highlight params are like below - curl http://localhost:8983/solr/nutch/select?q=content:build&hl=on&hl.q=content:build&wt=json&rows=10000&start=0
See the complete response -
$ curl http://localhost:8983/solr/nutch/select?q=content:build&hl=on&hl.q=content:build&wt=json&rows=10000&start=0
-sh: 8: hl.q=content:build: not found
[3] Done(127) hl.q=content:build
[2] Done curl http://localhost:8983/solr/nutch/select?q=content:build
$ {
"responseHeader":{
"status":0,
"QTime":1,
"params":{
"q":"content:build"}},
"response":{"numFound":2,"start":0,"docs":[
{
"digest":"ff0d20368525b3a0f14933eddb0809db",
"boost":1.0151907,
"id":"https://dummy_url",
"title":"dummy title",
"content":"dummy content",
"_version_":1691148343590256640},
{
"digest":"4fd333469ed5d83ad08eaa7ef0b779c4",
"boost":1.0151907,
"id":"https://dummy_url1",
"title":"dummy title1",
"content":"dummy content1",
"_version_":1691148343603888128}]
}}
Anyone have idea on how to resolve this? I am not getting any errors in nutch and solr logs.

You're not running the command you think you're running - & signals to the shell that the command should be run in the background, so what's effectively happening is that you're running multiple commands:
curl http://localhost:8983/solr/nutch/select?q=content:build
hl=on
hl.q=content:build
wt=json
rows=10000
start=0
This is not what you intend to do. You can either wrap your URL within quotes (") or escape the ampersands:
curl "http://localhost:8983/solr/nutch/select?q=content:build&hl=on&hl.q=content:build&wt=json&rows=10000&start=0"
# or
curl http://localhost:8983/solr/nutch/select? q=content:build\&hl=on\&hl.q=content:build\&wt=json\&rows=10000\&start=0

Related

SOLR backup status request not found

I have been trying to figure out a way that I know when the SOLR backup is done and its status. We have a lot of collections that we are trying to backup. The request has an error
status={state=notfound,msg=Did not find [requestId123] in any tasks queue}
When I looked at the SOLR source code, I realized that the status is reported from the request status in the overseer queue i.e. COMPLETED,FAILED,RUNNING,SUBMITTED is based on the overseer queue. When the request in not found in the overseer queue or when the queue is cleared then we get this error.
My question is there any other way to get the SOLR backup status reliably.
Thanks
Taking backup
I am not sure how you are running the process for a backup (nor where you can see that error). My assumption is that you are checking logs (because it looks like a similar message which will appear in logs).
Additionally you did not mention which solr version you are using. I will elaborate the answer bellow for 8.9 (but any version which supports v2 AND v1 api should work similar).
If you want to run backup asynchronously you can use following:
curl -X POST http://localhost:8983/api/collections -H 'Content-Type: application/json' -d '
{
"backup-collection": {
"name": "openaccess-v26-backup",
"collection": "openaccess-v26",
"location": "/var/solr/mounted-efs-backup",
"async": "1000"
}
}
'
This will start async process for a backup with track id 1000.
Checking action status
You can use following to check the status of the process:
curl 'http://localhost:8983/solr/admin/collections?action=REQUESTSTATUS&requestid=1000'
This will return response like this:
{
"responseHeader":{
"status":0,
"QTime":3},
"status":{
"state":"running",
"msg":"found [1000] in running tasks"}}
Additionally, this is the way to check all of the actions (not only backup). For example, using the same way you can check status of RESTORE action if you are restoring your backup into the solr collection.
Listing backups
It seems that relevant info as well can be to try to list backups from time to time and see is the backup within your list (if above approach is not working for you).
Please make a note that I am not 100% sure is there a possibility for backup to be listed if its not completed, but based on my testing and pure empirical approach seems that this is not the case.
So, if I start backup, and try to execute an api which is going to give me a list of all of the backups, I will get empty list for example:
curl -X POST http://localhost:8983/v2/collections/backups -H 'Content-Type: application/json' -d '
{
"list-backups" : {
"name": "openaccess-v26-backup",
"location": "/var/solr/mounted-efs-backup"
}
}'
{
"responseHeader":{
"status":0,
"QTime":165},
"backups":[]
}
}
However, if you execute this after a while (when backup is completed), the response will be in a following format:
{
"responseHeader":{
"status":0,
"QTime":14},
"collection":"openaccess-v26",
"backups":[{
"indexFileCount":0,
"indexSizeMB":0.0,
"shardBackupIds":{
"shard2":"md_shard2_0.json",
"shard3":"md_shard3_0.json",
"shard1":"md_shard1_0.json"},
"collection.configName":"openaccess-v26",
"backupId":0,
"collectionAlias":"openaccess-v26",
"startTime":"2022-07-05T08:34:53.703175Z",
"indexVersion":"8.9.0"}]}
This kind of approach works fine for the 8.9 version of solr im using with apiv2.
I was able to restore and use backups without any kind of issues after they are listed.

Solr 8 upgrade and stream.body

I'm upgrading Solr from 6.x to 8.x. In the past, we used to build our request thusly in our PHP script:
$aPostData = array(
'stream.body' => '{"add": {"doc":{...stuff here...}}',
'commit' => 'true',
'collection' => 'mycollection',
'expandMacros' => 'false'
);
$oBody = new \http\Message\Body();
$oBody->addForm($aPostData);
sending it to our Solr server at /solr/mycollection/update/json. That worked just fine in 6.x but now that I've upgraded to 8.x, I'm receiving the following response from Solr
{
"responseHeader":{
"status":400,
"QTime":1
},
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"missing content stream",
"code":400
}
}
Digging around I ran across the following
https://issues.apache.org/jira/browse/SOLR-10748
and
Solr error - Stream Body is disabled
I tried following the suggestions of both answers. For the first one, I now see a file called "configoverlay.json" in my ./conf directory and it has those settings. For the second answer, I set it up so my requestParsers node had those attributes. However, neither worked. I've searched around but at this point I'm at my wits end. How can I make it so that I can continue using "stream.body"? If I shouldn't be using "stream.body" is there some other request var that I can/should use when sending my data? I couldn't find anything in the documentation. Perhaps I was looking in the wrong place?
Any help would be greatly appreciated.
thnx,
Christoph

Clearing SOLR 7.1 index

I have been using SOLR 4.10.2, and am getting ready to migrate to 7.1
Under 4.10.2 I was able to clear an index with the following:
var address = #"http://mysolrserver:8983/solr/mysolrcore/update?stream.body=<delete><query>(*:*)</query></delete>&commit=true";
WebClient client = new WebClient();
client.DownloadString(address).Dump();
When I try this against a SOLR 7.1 server, I get a response 400 - Bad request.
{
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"Stream Body is disabled. See http://lucene.apache.org/solr/guide/requestdispatcher-in-solrconfig.html for help",
"code":400}}
I went into solrconfig.xml for the core and set the element to
<requestParsers enableRemoteStreaming="true"
multipartUploadLimitInKB="2048000"
formdataUploadLimitInKB="2048"
addHttpRequestToContext="false"/>
but I still get the same error.
Since 7.1 is now json by default, I have tried adding
&wt=xml
to the end of the url, but I get the same result: 400 - Bad Request
Any ideas?
You're switching the wrong parameter. If you want to allow stream.body in the URL, you have to set enableStreamBody="true". enableRemoteStreaming controls stream.file and stream.url which can be used to read from remote locations.
I run below call in postman, after deleting query working fine.
http://localhost:8983/solr/CORENAME/config -H 'Content-type:application/json' -d'{
"set-property" : {"requestDispatcher.requestParsers.enableRemoteStreaming":true},
"set-property" : {"requestDispatcher.requestParsers.enableStreamBody":true}
}'

solr highlighting shows empty result

Hi I am new in Solr and using Solr 7.0.0 running in windows 7.
I created a collection and indexed folder with pdf and html files residing in a folder using the following command:
> java -jar -Dc=guidanceDoc -Dauto example\exampledocs\post.jar M:\Projects\guidance\documents\*
If I write a query, I get results. However, if I turn the hl=on, I get a section for highlights without any text.
Here is the query:
http://localhost:8983/solr/guidanceDoc/select?hl.fl=_text_&hl=on&%20q=_text_:"Home%20Use"
Here is the highlight part of the result:
"highlighting":{
"M:\\Projects\\g1\\documents\\gg331681":{},
"M:\\Projects\\g1\\documents\\gg209337":{},
"M:\\Projects\\g1\\documents\\ggM380327":{},
"M:\\Projects\\g1\\documents\\gg470201":{},
"M:\\Projects\\g1\\documents\\gg507278":{},
"M:\\Projects\\g1\\documents\\gg073767":{},
"M:\\Projects\\g1\\documents\\gg380325":{},
"M:\\Projects\\g1\\documents\\gg484345":{},
"M:\\Projects\\g1\\documents\\gg259760":{}}}
How can I make it work?
Your field for a highlighting should be marked as stored=true. Since you're running Solr in cloud mode, I will recommend to use Schema API to change field definition:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"replace-field":{
"name":"hl_field",
"type":"text",
"stored":true }
}' http://localhost:8983/solr/guidanceDoc/schema

Error when adding user for Solr Basic Authentication

When I try to add the user for the Solr Basic Authentication using the following method in curl
curl --user user:password http://localhost:8983/solr/admin/authentication -H 'Content-type:application/json' -d '{
"set-user": {"tom" : "TomIsCool" ,
"harry":"HarrysSecret"}}'
I get the following error:
{
"responseHeader":{
"status":400,
"QTime":0},
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"No contentStream",
"code":400}}
curl: (3) [globbing] unmatched brace in column 1
枩]?V7`-{炘9叡 t肤 ,? E'qyT咐黣]儎;衷 鈛^W褹?curl: (3) [globbing] unmatched cl
ose brace/bracket in column 13
What does this error means and how should we resolve it?
I'm using SolrCloud on Solr 6.4.2.
Regards,
Edwin
If you're using curl under Windows, this is a known issue with cmd.exe's escaping of single quotes. Use double quotes around your JSON string (or use cygwin, powershell, etc.)
curl --user user:password http://localhost:8983/solr/admin/authentication -H
"Content-type:application/json" -d "{
\"set-user\": {\"tom\" : \"TomIsCool\" ,
\"harry\":\"HarrysSecret\"}}"
The "globbing" message from curl is the hint that curl is doing something else than what you intended, and that the actual body of the request isn't getting to Solr (which is complaining about no message body being present).
You could also get around this by using stream.body in the URL and making the request from your browser.

Resources