Delete all documents from Solr that have a certain empty field - solr

Querying for those documents works with: "fq=-myfield:[* TO *]".
But how can I delete all those? It seems that the delete syntax update?stream.body=<delete><query>... accepts only a query, no filters...

Only pass -myfield[* TO *] in query tag. Do not pass fq parameter. Then it will work I feel. Once I had to delete all documents with id that contained word "data" in the id field string, I just passed id:*data* between query tags, and it worked. Let me know if that helps you.

The correct answer should be: -myfield:* or even -myfield:[* TO *], but the : is mandatory.
This is is an example with curl:
curl http://localhost:8983/solr/collection/update \
-H "Content-Type: text/xml" \
--data-binary '<delete><query>-myfield:*</query></delete>'

Related

Document is not returned when searched using query parameter in solr

I updated a document in solr using the below query and it was successful.The document has other fields like organization_name,place etc apart from what is shown in the api below.
curl -X POST -H 'Content-Type: application/json' 'http://localhost:8390/solr/collection/update?commit=true' -d '[{"id" : “12345”,”code” : {"set" : “500”}}]’
{
"responseHeader":{
"rf":1,
"status":0,
"QTime":11}}
Post update, when I tried to query solr with query parameter(q) as name, it does not return any document. At the same time, if I query using name as fq parameter, I see the document coming up fine.
This query doesnot work:-
http://localhost:8390/solr/collection/select?q=test&qf=content^0.1%20name_display_name^1.0&defType=edismax
But,this query works(with fq param),
http://localhost:8390/solr/collection/select?q=*%3A*&rows=1000&fq=ngram_info_organization_name:test
The field type of organization_name is string and its both indexed and stored.
This issue is seen only for the document that i updated. If I query for other documents which are not updated, i am able to see the results.
Please help to figure out why the document is not listed when I use the name with query parameter.

DisMax query parser is not running

Environment- solr-8.9.0
Movies data in a form of a .csv file has been indexed in apache solr.
Movies data
name,directed_by,genre,type,id,initial_release_date
.45,Gary Lennon,Black comedy|Thriller|Psychological thriller|Indie film|Action Film|Crime Thriller|Crime Fiction|Drama,,/en/45_2006,2006-11-30
9,Shane Acker,Computer Animation|Animation|Apocalyptic and post-apocalyptic fiction|Science Fiction|Short Film|Thriller|Fantasy,,/en/9_2005,2005-04-21
Bomb the System,Adam Bhala Lough,Crime Fiction|Indie film|Coming of age|Drama,,/en/bomb_the_system,
movie_name,Adam Bhala,Animation|Indie film|Coming of age|Drama,,/en/bomb_the_system,
The DisMax Query Parser has been run over the field 'directed_by'. 'directed_by' field is mapped as a 'text_general' field-type in managed-schema.
I have run the following query over solr
curl -G http://localhost:8983/solr/testCore1/select --data-urlencode "q=directed_by:'Adam Bhala Lough~'" --data-urlencode "defType=dismax" --data-urlencode "mm=2"
but the above query is giving 0 'numFound' in response.
I expect the following field to match:
Adam Bhala
Adam Bhala Lough
I expect the following field not to match:
Adam Gary Lennon
Although data is indexed in solr. I have cross-checked this by running the following query without disMax query parser and it is giving responses.
curl -G http://localhost:8983/solr/testCore1/select --data-urlencode "q=directed_by:'Adam Bhala Lough~'"
Why dismax query parser is not running as it should be? I understand that mm(Minimum Should Match) Parameter===> defines the minimum number of clauses that must match.
I have spent hours to solve this. However, I cannot seem to find anything that holds my hand.
Could someone help me find the missing piece?
Reference
https://solr.apache.org/guide/8_9/the-dismax-query-parser.html
How to use disMax query parser in solr
You need to use the qf (query fields) parameter to specify in which field(s) the search should be performed, along with (optionally) a boost factor used to increase or decrease that particular field’s importance in the query (defaults to 1) :
q=Adam Bhala Lough~
qf=directed_by
Or, use edismax query parser instead of dismax, so that the standard query syntax you were using (directed_by:'Adam Bhala Lough~') can work as intended.

Apache solr fuzzy search on list values

Enviornment - solr-8.9.0
To implement fuzzy search on column "name" (fuzzy search for 'alaistiar~') of csv file in apache solr i am issueing following query
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:'alaistiar~'&wt=json
To implement fuzzy search on column "name" (fuzzy search for 'shanka~') of csv file in apache solr
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:'shanka~'&wt=json
May i combine both the above query in a single and find out the documents?
My first http request is doing fuzzy search for value alaistiar~ on name colums and giving some score value and second http request is for shanka~. When i combine both with 'OR' operator Will it behave same as they are individual request.Acutally My purpose is that i dont want to invoke multiple http request for multiple names, Also i want fuzzy search name in output indicating that this document is for name alaistiar~ and this document is for name shanka~
I have loaded a csv file having 4 columns(Size-5GB.) with 100 milion records. .csv file has following column names -
'name', 'father_name', 'date_of_passing','admission_number'
I have created index on column 'name. To do this i have executed following curl request on managed-schema(solr-8.9.0, jdk-11.0.12)
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"name","type":"text_general","stored":true,"indexed":true }}' http://localhost:8983/solr/bigboxstore/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"father_name","type":"text_general","stored":true,"indexed":false }}' http://localhost:8983/solr/bigboxstore/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"date_of_passing","type":"pdate","stored":true,"indexed":false }}' http://localhost:8983/solr/bigboxstore/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"admission_number","type":"text_general","stored":true,"indexed":false }}' http://localhost:8983/solr/bigboxstore/schema
Is this a right way to create index on 1 column(only on name) as described above?
Now i have list of 1 milion names. On each name i have to do fuzzy-search(column:name) on already loaded data. In the output, for each name I have to return list of java objects including all 4 columns of .csv file.
Note- In output I also have to include name which was supplied as input(in where clause).
For each name, i am doing fuzzy search as follows :
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:'alaistiar~'&wt=json.
To do this i have to execute 1 milion http request, which i dont want. Instead of executing 1milion http request, May i do in a single http request?
I understand that 'OR operator will not solve my problem because i will not able to group output documents on the basis of name which was passed as a input.
Yes, you could unify the queries by using "OR":
name:alaistiar~ OR name:shanka~
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:alaistiar~ OR name:shanka~&wt=json
You could omit "OR" if your default operator is "OR". The query would look like:
name:alaistiar~ name:shanka~
http://localhost:8983/solr/bigboxstore/select?indent=on&q=name:alaistiar~ name:shanka~&wt=json
Of course the "space" should be escaped in URL.
Hello, again. After you edited your question, it is more clear what you are looking for:
only 1 query for 1 mil names
see in the result which response to which name corresponds
There is a solutions, but you have to do some post processing. You can use a POST request with a json with params (for 1) and you can use hit highlighting (for 2) like I did here:
curl 'http://localhost:8983/solr/bigboxstore/query?hl.fl=name&hl.simple.post=</b>&hl.simple.pre=<b>&hl=on' -H "Content-Type: application/x-www-form-urlencoded" -X POST -d 'json={"query":"name:alaistiar~ name:shanka~"}'
The answer contains two parts: the first one with the result and the second one with the id and the highlights -> you will have to pair them up on id after receiving the response.

Solr add field to schema using curl

A:\DOS> curl -X POST -H 'Content-type:application/json' \
--data-binary '{"add-field":{"name":"timestamp","type":"date","indexed":true,"stored":true,\
"default":NOW,"multiValued":false}}' http://localhost:8983/solr/testt/schema
{
"reponseHeader":{
"status":0,
"QTime":0},
"errors":"no stream"}
I am trying to add a 'timestamp' field to solr and this is the error which I am getting. Can anyone help me figure out where I am wrong in this?
There may have two problem
&commit=true
At the end of URL commit parameter was not added.
schema.xml
Schema not contain field you wanted to set.
There is a problem with your curl command. Because if you try it differently it works
curl -d '#timestamp.json' -X POST -H 'Content-type:application/json' http://localhost:8983/solr/testt/schema
and create the timestamp.json file
{
"add-field":{"name":"timestamp","type":"date","indexed":true,"stored":true,"default":NOW,"multiValued":false}
}

How to delete a doc at a specific shard in Solr

I want to delete a specific doc at a specific shard in Solr, below is my query:
http://localhost:8080/solr/collections_1_replica1/update?stream.body=<delete><query>id:1</query></delete>&commit=true&distrib=false
But this still effect to collections_2_replica1, so what is the correct query in this case.
If you use the default Solr Cloud collection configurations, Solr choose where to put the document according to the document id (docId.hash() % number of shards).
In other words, you're not supposed to delete from a specific shard because you can't be sure whether the document is there or on the other shards.
If I'm not wrong, the distrib=false parameter is not effective in updated.
Tyr out this one :
curl http://xx.xx.xx.xx:8983/solr/collection_name/update/?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>id:1</query></delete>'

Resources