Document is not returned when searched using query parameter in solr - solr

I updated a document in solr using the below query and it was successful.The document has other fields like organization_name,place etc apart from what is shown in the api below.
curl -X POST -H 'Content-Type: application/json' 'http://localhost:8390/solr/collection/update?commit=true' -d '[{"id" : “12345”,”code” : {"set" : “500”}}]’
{
"responseHeader":{
"rf":1,
"status":0,
"QTime":11}}
Post update, when I tried to query solr with query parameter(q) as name, it does not return any document. At the same time, if I query using name as fq parameter, I see the document coming up fine.
This query doesnot work:-
http://localhost:8390/solr/collection/select?q=test&qf=content^0.1%20name_display_name^1.0&defType=edismax
But,this query works(with fq param),
http://localhost:8390/solr/collection/select?q=*%3A*&rows=1000&fq=ngram_info_organization_name:test
The field type of organization_name is string and its both indexed and stored.
This issue is seen only for the document that i updated. If I query for other documents which are not updated, i am able to see the results.
Please help to figure out why the document is not listed when I use the name with query parameter.

Related

DisMax query parser is not running

Environment- solr-8.9.0
Movies data in a form of a .csv file has been indexed in apache solr.
Movies data
name,directed_by,genre,type,id,initial_release_date
.45,Gary Lennon,Black comedy|Thriller|Psychological thriller|Indie film|Action Film|Crime Thriller|Crime Fiction|Drama,,/en/45_2006,2006-11-30
9,Shane Acker,Computer Animation|Animation|Apocalyptic and post-apocalyptic fiction|Science Fiction|Short Film|Thriller|Fantasy,,/en/9_2005,2005-04-21
Bomb the System,Adam Bhala Lough,Crime Fiction|Indie film|Coming of age|Drama,,/en/bomb_the_system,
movie_name,Adam Bhala,Animation|Indie film|Coming of age|Drama,,/en/bomb_the_system,
The DisMax Query Parser has been run over the field 'directed_by'. 'directed_by' field is mapped as a 'text_general' field-type in managed-schema.
I have run the following query over solr
curl -G http://localhost:8983/solr/testCore1/select --data-urlencode "q=directed_by:'Adam Bhala Lough~'" --data-urlencode "defType=dismax" --data-urlencode "mm=2"
but the above query is giving 0 'numFound' in response.
I expect the following field to match:
Adam Bhala
Adam Bhala Lough
I expect the following field not to match:
Adam Gary Lennon
Although data is indexed in solr. I have cross-checked this by running the following query without disMax query parser and it is giving responses.
curl -G http://localhost:8983/solr/testCore1/select --data-urlencode "q=directed_by:'Adam Bhala Lough~'"
Why dismax query parser is not running as it should be? I understand that mm(Minimum Should Match) Parameter===> defines the minimum number of clauses that must match.
I have spent hours to solve this. However, I cannot seem to find anything that holds my hand.
Could someone help me find the missing piece?
Reference
https://solr.apache.org/guide/8_9/the-dismax-query-parser.html
How to use disMax query parser in solr
You need to use the qf (query fields) parameter to specify in which field(s) the search should be performed, along with (optionally) a boost factor used to increase or decrease that particular field’s importance in the query (defaults to 1) :
q=Adam Bhala Lough~
qf=directed_by
Or, use edismax query parser instead of dismax, so that the standard query syntax you were using (directed_by:'Adam Bhala Lough~') can work as intended.

Apache solr fuzzy search on list values

Enviornment - solr-8.9.0
To implement fuzzy search on column "name" (fuzzy search for 'alaistiar~') of csv file in apache solr i am issueing following query
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:'alaistiar~'&wt=json
To implement fuzzy search on column "name" (fuzzy search for 'shanka~') of csv file in apache solr
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:'shanka~'&wt=json
May i combine both the above query in a single and find out the documents?
My first http request is doing fuzzy search for value alaistiar~ on name colums and giving some score value and second http request is for shanka~. When i combine both with 'OR' operator Will it behave same as they are individual request.Acutally My purpose is that i dont want to invoke multiple http request for multiple names, Also i want fuzzy search name in output indicating that this document is for name alaistiar~ and this document is for name shanka~
I have loaded a csv file having 4 columns(Size-5GB.) with 100 milion records. .csv file has following column names -
'name', 'father_name', 'date_of_passing','admission_number'
I have created index on column 'name. To do this i have executed following curl request on managed-schema(solr-8.9.0, jdk-11.0.12)
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"name","type":"text_general","stored":true,"indexed":true }}' http://localhost:8983/solr/bigboxstore/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"father_name","type":"text_general","stored":true,"indexed":false }}' http://localhost:8983/solr/bigboxstore/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"date_of_passing","type":"pdate","stored":true,"indexed":false }}' http://localhost:8983/solr/bigboxstore/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"admission_number","type":"text_general","stored":true,"indexed":false }}' http://localhost:8983/solr/bigboxstore/schema
Is this a right way to create index on 1 column(only on name) as described above?
Now i have list of 1 milion names. On each name i have to do fuzzy-search(column:name) on already loaded data. In the output, for each name I have to return list of java objects including all 4 columns of .csv file.
Note- In output I also have to include name which was supplied as input(in where clause).
For each name, i am doing fuzzy search as follows :
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:'alaistiar~'&wt=json.
To do this i have to execute 1 milion http request, which i dont want. Instead of executing 1milion http request, May i do in a single http request?
I understand that 'OR operator will not solve my problem because i will not able to group output documents on the basis of name which was passed as a input.
Yes, you could unify the queries by using "OR":
name:alaistiar~ OR name:shanka~
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:alaistiar~ OR name:shanka~&wt=json
You could omit "OR" if your default operator is "OR". The query would look like:
name:alaistiar~ name:shanka~
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:alaistiar~ name:shanka~&wt=json
Of course the "space" should be escaped in URL.
Hello, again. After you edited your question, it is more clear what you are looking for:
only 1 query for 1 mil names
see in the result which response to which name corresponds
There is a solutions, but you have to do some post processing. You can use a POST request with a json with params (for 1) and you can use hit highlighting (for 2) like I did here:
curl 'http://localhost:8983/solr/bigboxstore/query?hl.fl=name&hl.simple.post=</b>&hl.simple.pre=<b>&hl=on' -H "Content-Type: application/x-www-form-urlencoded" -X POST -d 'json={"query":"name:alaistiar~ name:shanka~"}'
The answer contains two parts: the first one with the result and the second one with the id and the highlights -> you will have to pair them up on id after receiving the response.

Using Apache Solr to do faceted search for unstructured text

I have a set of text documents that I have indexed in my respective collection defined in Solr. I am able to do a keyword based search that returns the required documents having the typed in keyword. But my next objective is to do faceted search on unstructured text wherein I am able to retrieve results based on facet fields.
I have tried the following steps:
1) I have defined a new field (distributioncompany)in managed-schema that will act as facet field with a copyfield also defined (distributioncompany_str). But when I do indexing curl command ( with id and distribution company name passed as arguments), I get the facet counts with q =*, but doesnot work when I type in a keyword in q field.
2) I also tried text tagger feature with a tag field defined for te facet field so that required entity can be extracted from document and matched against a list of tag values. But tag field not getting returned
For 1st approach:
1)
curl -X POST -H ‘Content-type:text/plain’ –data-binary ‘{“add-field”:{“name”:”distributioncompany_str”,”type”:”string”,”multiValued”:true,
“indexed”:true,”stored”:false}}’
http://localhost:8983/solr/collectionname/schema
(same code used for adding distributioncompany field)
2) Copy field added:
curl -X POST -H ‘Content-type:text/plain’ –data-binary ‘{“add-copy-field”:{“source”:”distributioncompany”,”dest”:”distributioncompany_str”}}’
http://localhost:8983/solr/collectionname/schema
3) Added a new document to index:
curl ‘http://localhost:8983/solr/collectionname/update/json/docs’ -H ‘Content-type:text/plain’ -d ‘{“id”:”Appeal No. 220 of 2013.pdf.txt”,”distributioncompany”:”Himachal Pradesh State Electricity Board”}’
But if query is done using q =*, it shows facet field count, but if query done using a keyword present in the document, it doesnot show up
For 2nd approach ( text tagger)
1) Added a new fieldtype "tag" in schema of collection
2) Added new fields ( a) trancompany (type:text_general),b) trancompany_tag(type:tag), c) copyfield for these 2 fields
3) Added a new custom SolrRequest Handler in Solrconfig file:
curl -X POST -H ‘Content-type:application/json’ http://192.168.0.95:8983/solr/rajdhanitest2/config -d ‘{
“add-requesthandler”:{
“name”:”/tag”,
“class”:”solr.TaggerRequestHandler”,
“defaults”:{“field”:” trancompany_tag”}
}
}’
4) Updates values for tag field "trancompany_tag" with curl command
5) But on passing text with one of tag values updated, only id gets returned not the tag value
For both approaches, the required faceted/ tagged field to be extracted from text document is not displayed when search query is done. Would appreciate help in guiding me how to do a faceted search for unstructured text documents

Delete all documents from Solr that have a certain empty field

Querying for those documents works with: "fq=-myfield:[* TO *]".
But how can I delete all those? It seems that the delete syntax update?stream.body=<delete><query>... accepts only a query, no filters...
Only pass -myfield[* TO *] in query tag. Do not pass fq parameter. Then it will work I feel. Once I had to delete all documents with id that contained word "data" in the id field string, I just passed id:*data* between query tags, and it worked. Let me know if that helps you.
The correct answer should be: -myfield:* or even -myfield:[* TO *], but the : is mandatory.
This is is an example with curl:
curl http://localhost:8983/solr/collection/update \
-H "Content-Type: text/xml" \
--data-binary '<delete><query>-myfield:*</query></delete>'

How to delete a doc at a specific shard in Solr

I want to delete a specific doc at a specific shard in Solr, below is my query:
http://localhost:8080/solr/collections_1_replica1/update?stream.body=<delete><query>id:1</query></delete>&commit=true&distrib=false
But this still effect to collections_2_replica1, so what is the correct query in this case.
If you use the default Solr Cloud collection configurations, Solr choose where to put the document according to the document id (docId.hash() % number of shards).
In other words, you're not supposed to delete from a specific shard because you can't be sure whether the document is there or on the other shards.
If I'm not wrong, the distrib=false parameter is not effective in updated.
Tyr out this one :
curl http://xx.xx.xx.xx:8983/solr/collection_name/update/?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>id:1</query></delete>'

Resources